freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
550 stars 151 forks source link

feat(scrape_pacer_free_opinions): apply task recap_document_into_opinions #4638

Closed grossir closed 2 weeks ago

grossir commented 3 weeks ago

Features:

Refactors:

sentry-io[bot] commented 3 weeks ago

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/corpus_importer/tasks.py

Function Unhandled Issue
ingest_recap_document HTTPStatusError: Server error '500 Internal Server Error' for url 'http://cl-doctor:5050/extract/recap/text/?strip... ...
Event Count: 3

Did you find this useful? React with a 👍 or 👎

flooie commented 2 weeks ago

I'm not sure if this is ready or not - last I heard it was being checked still. If thats not the case please let me know @grossir next time you are in.

grossir commented 2 weeks ago

I solved my Celery issues, and just finished testing the changes. Both recap_into_opinions and scrape_pacer_free_documents commands work with the introduced changes.

We should introduce some tests, to help ensure that future changes do not change the desired behavior. Do you want me to do it on this PR, or should I create ? @flooie If not, this is ready for merging


About Celery, it took me a while to understand what was failing. The recap_into_opinions command uses the "batch1" queue by default. When testing locally, one must activate the celery workers manually. However, if one does not specify the queues they will target, they will only target the default queue named "celery". This caused the behavior I was observing, that tasks were sent, but not executed, since no worker was looking for them. The same was happening on scrape_free_pacer_opinions, which has a default queue name of "pacerdoc1". I have added comments to the wiki to warn a future developer/tester about this.

mlissner commented 2 weeks ago

Heck yeah. Nice to have this going in.

sentry-io[bot] commented 2 weeks ago

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

Did you find this useful? React with a 👍 or 👎

grossir commented 1 week ago

This is working, we have opinions from yesterday, created around today's midnight. Cluster object example