freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
511 stars 139 forks source link

****Error: Unable to extract content due to unknown extension, extracting text from js: 9898675 -... #3381

Open sentry-io[bot] opened 8 months ago

sentry-io[bot] commented 8 months ago

I suspect this is related to the new version of juriscraper. Perhaps one of the new scrapers isn't quite working?

Sentry Issue: COURTLISTENER-5EN

****Error: Unable to extract content due to unknown extension, extracting text from js: 9898675 - Byron Johnson v. Kaija Freborg, A21-1531, Supreme Court, September 20, 2023****
mlissner commented 8 months ago

@flooie, flagging for you.

flooie commented 8 months ago

@mlissner CAPTCHA

mlissner commented 8 months ago

Fun. Well, let's disable this scraper then, so the error stops and we're not getting bad data. Once we get our new scraper contractor, we can figure out a captcha solving service.

flooie commented 8 months ago

I took another look at this -

We can resolve this if we simply change the user agent. A normal user agent is required to avoid a block for scraping the metadata and it is required for scraping the actual PDF. If we adjust the user agent during the collection of the binary content

flooie commented 8 months ago

This should be resolved with the update to the scraper.

sentry-io[bot] commented 8 months ago

Another instance of this, I think:

Sentry issue: COURTLISTENER-5P4

flooie commented 8 months ago

Yes I agree. I was wondering if it was something in a queue

mlissner commented 8 months ago

I doubt it. it was fixed days ago, right?

flooie commented 8 months ago

I'd have to check when it was finished pushing. It's again a captcha thing. I'm going to follow up with the court. I reached out last week and the woman was ... surprised because she never triggers it

sentry-io[bot] commented 8 months ago

Sentry issue: COURTLISTENER-5P9

sentry-io[bot] commented 8 months ago

Sentry issue: COURTLISTENER-5P7

mlissner commented 8 months ago

Another mess of issues in Sentry about this one today.

flooie commented 8 months ago

Great. I'll verify which court and disable them. I have an outstanding email to the Minnesota courts about this issue I'll follow up with.