freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
363 stars 108 forks source link

If I write a scraper, will you run it with your account? #254

Closed MaxPowerWasTaken closed 5 years ago

MaxPowerWasTaken commented 5 years ago

Hi,

Thanks for maintaining such an awesome project and database!

I was hoping to find PACER dockets for the US District of Puerto Rico at https://www.courtlistener.com/api/bulk-info/ but it appears there are only opinions (not dockets) for PRD. If I were to write a scraper for PACER dockets for PRD and submit a pull request to juriscraper, and if it passes your standards, would freelawproject run it?

I'm a little wary of doing it with my credentials (really my friend's credentials). In case it somehow gets flagged, I don't want to risk his account. But we would be happy to submit our scraper here via a pull request so the community can benefit from the new cases we'd like to download. Please let us know if that arrangement works for you.

(EDIT: in case it matters, my friend is a criminal defense attorney and our purpose is to gather and organize case data to provide a tool for other criminal defense attorneys)

Max

mlissner commented 5 years ago

We do a lot of scraping for folks, but it's usually done as a service, with some exceptions. What did you have in mind for scraping?

Our PACER data doesn't make its way into bulk data files normally, but is in our APIs and is replicated via our replication service.

Don't worry about getting flagged by PACER. Remember, you're paying them money.

johnhawkinson commented 5 years ago

Err, I think we could be a lot more clear here: Courtlistener scrapes stuff that is free, and opinions are free to download in PACER. Dockets aren't free to download (modulo information via RSS), so Courtlistener doesn't scrape them.

If you were interested in contracting with the Free Law Project to scrape dockets and paying the fees, I'm sure Mike would be happy to talk to you.

mlissner commented 5 years ago

I'm going to close this one down, lest we forget about it and it lingers forever. @MaxPowerWasTaken if you're interested in talking more, probably an email is better since this isn't really a bug anyway. I'm at mike@free.law.

MaxPowerWasTaken commented 5 years ago

Thanks Mike and John, appreciate the answers, and no complaint from me closing this issue.

The answer I'm taking away is no use submitting a prd-docket-scraper here to jurisscraper because jurisscraper/courtlistener doesn't scrape dockets anyway. And that we shouldn't worry so much about using our (paid) PACER account to scrape those PACER dockets ourselves.

If I have any other questions, will follow up by email. Thanks again!

johnhawkinson commented 5 years ago

Well, I mean CourtListener/RECAP know how to parse dockets, and scraping them is easy enough too [actually I think juriscraper knows how to do that?].

it's just that there's no budget for it.

MaxPowerWasTaken commented 5 years ago

Ok. I'm still a little confused. I'm a software developer and will probably now proceed to program a PACER docket scraper for US District of Puerto Rico cases. Would it be helpful to you guys if I shared that scraper code with this project? If so, I'm happy to. Either way, thanks again for the answers so far.

mlissner commented 5 years ago

So there's two parts to a scraper as I think about it. The code that parses the HTML and the code that sends GET and POST requests. We have tools for both in Juriscraper, but we don't have any code on our server that's actively scraping these dockets because to do so costs money in PACER fees.

If you had a budget to pay for PACER fees, we would be happy to scrape these cases for you and provide the results in the APIs. But we're not doing active scraping presently (though we have the code for it in this repo.)

Does that clarify? Do you have a budget for PACER fees?

mlissner commented 5 years ago

(I don't think you should create a new parser for dockets unless you have checked ours out.)

MaxPowerWasTaken commented 5 years ago

Thanks Mike that is very helpful.

codestoned1 commented 1 year ago

Hey guys! Know this is an old thread, but where can I look in the codebase for docket specific scraping (totally ok with paying PACER charges)?

mlissner commented 1 year ago

Our documentation is lacking, sorry, but I'm happy to answer this when we chat.