Closed MaxPowerWasTaken closed 5 years ago
We do a lot of scraping for folks, but it's usually done as a service, with some exceptions. What did you have in mind for scraping?
Our PACER data doesn't make its way into bulk data files normally, but is in our APIs and is replicated via our replication service.
Don't worry about getting flagged by PACER. Remember, you're paying them money.
Err, I think we could be a lot more clear here: Courtlistener scrapes stuff that is free, and opinions are free to download in PACER. Dockets aren't free to download (modulo information via RSS), so Courtlistener doesn't scrape them.
If you were interested in contracting with the Free Law Project to scrape dockets and paying the fees, I'm sure Mike would be happy to talk to you.
I'm going to close this one down, lest we forget about it and it lingers forever. @MaxPowerWasTaken if you're interested in talking more, probably an email is better since this isn't really a bug anyway. I'm at mike@free.law.
Thanks Mike and John, appreciate the answers, and no complaint from me closing this issue.
The answer I'm taking away is no use submitting a prd-docket-scraper here to jurisscraper because jurisscraper/courtlistener doesn't scrape dockets anyway. And that we shouldn't worry so much about using our (paid) PACER account to scrape those PACER dockets ourselves.
If I have any other questions, will follow up by email. Thanks again!
Well, I mean CourtListener/RECAP know how to parse dockets, and scraping them is easy enough too [actually I think juriscraper knows how to do that?].
it's just that there's no budget for it.
Ok. I'm still a little confused. I'm a software developer and will probably now proceed to program a PACER docket scraper for US District of Puerto Rico cases. Would it be helpful to you guys if I shared that scraper code with this project? If so, I'm happy to. Either way, thanks again for the answers so far.
So there's two parts to a scraper as I think about it. The code that parses the HTML and the code that sends GET and POST requests. We have tools for both in Juriscraper, but we don't have any code on our server that's actively scraping these dockets because to do so costs money in PACER fees.
If you had a budget to pay for PACER fees, we would be happy to scrape these cases for you and provide the results in the APIs. But we're not doing active scraping presently (though we have the code for it in this repo.)
Does that clarify? Do you have a budget for PACER fees?
(I don't think you should create a new parser for dockets unless you have checked ours out.)
Thanks Mike that is very helpful.
Hey guys! Know this is an old thread, but where can I look in the codebase for docket specific scraping (totally ok with paying PACER charges)?
Our documentation is lacking, sorry, but I'm happy to answer this when we chat.
Hi,
Thanks for maintaining such an awesome project and database!
I was hoping to find PACER dockets for the US District of Puerto Rico at https://www.courtlistener.com/api/bulk-info/ but it appears there are only opinions (not dockets) for PRD. If I were to write a scraper for PACER dockets for PRD and submit a pull request to juriscraper, and if it passes your standards, would freelawproject run it?
I'm a little wary of doing it with my credentials (really my friend's credentials). In case it somehow gets flagged, I don't want to risk his account. But we would be happy to submit our scraper here via a pull request so the community can benefit from the new cases we'd like to download. Please let us know if that arrangement works for you.
(EDIT: in case it matters, my friend is a criminal defense attorney and our purpose is to gather and organize case data to provide a tool for other criminal defense attorneys)
Max