Open grossir opened 8 months ago
@flooie can you please check this source? From what I see we may need to parse the pdfs, since old case information is not available on HTML, as it is in colo
Hmm. Is it possible the html changes.
Some news about coloctapp
, the Colorado Courts have just (well, on March 1, 2024) launched a new site for Appellate Opinions, and it actually has past opinions in HTML. We could implement the backscraper from there instead of dealing with PDFs
Check it out here
CO was one of the worst states. Does this mean it's finally not so terrible?
This new Colorado site seems to have no search filters except for "court". Getting the document url requires more steps/requests. And it uses vlex
as the backend. The downloaded opinion PDF comes in a zip, and the document has a vlex
link in it
So, I don't know if it qualifies as not being terrible, but at least it will let us look for past opinions without going into PDFs
So, we also have a more recent gap. We are missing every Opinion announced on:
I don't know why the scraper has been failing...
Command to fill the gaps
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.coloctapp --backscrape-start=09/28/2021 --backscrape-end=02/01/2022
The old scraper went down some months ago. Most recent colotctapp
opinion is from March 7th, 2024, so this is a new gap
From #929 , related to #974
coloctapp
Between September 29, 2021 and February 02, 2022 we have 0 documents. We are missing documents, but must go into PDFs to get them now