Closed grossir closed 13 hours ago
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.federal_special.tax --backscrape-start=11/19/2020 --backscrape-end=01/25/2022
There are 279 opinions in total in the source for that time period. We have scraped 275, after Ramiro ran the command.
I haven't found an error on Sentry, which leads me to think they were skipped for some reason on cl_scrape_opinions
. However, the logger.debug
calls won't show on the server (only info upwards) so I can't really tell what happened. It also seems that logger.info
calls from the scraper file won't show either
Detail of the count on the source: 11/19/2020 - 04/01/2021: 92 04/01/2021 - 08/01/2021: 98 08/01/2021 - 01/25/2022: 89
Thanks for staying on this. We'll get 'em all!
By manually checking the range 11/19/2020 - 04/01/2021, I found one document the backscraper did not collect, Memorandum Opinion for case "Kumar Rajagopalan & Susamma Kumar", dated 11/19/20 (tax
links are not permanent). I downloaded it, and got the hash, which does exist on courtlistener. The same thing was happening for fla
#960 , so this may be the blanket reason why we don't get exact counts. I will check a couple more for tax
Part of #929
To help solve this, a dynamic backscraper will be implemented.
About the gap, we have 0 documents between November 20th, 2020 and January 26th, 2022. Filtering by those dates on the source, there are more than 200 docs (tried splitting the range in half, there is still more than 100 in each half)