freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Attachment menu/doc1 page not detected sometimes #291

Closed mdaniels5757 closed 1 year ago

mdaniels5757 commented 3 years ago

I think this is my first bug filed here, apologies if I screwed this up.

Expected behavior w/ example:

Actual behavior on certain doc1 pages:

Suspected cause:

Possible (untested!) solution:

Best, Michael Daniels

mlissner commented 3 years ago

Thanks Michael. This looks like a dup of #238. Your solution seems logical to me. Want to take a crack at PR?

johnhawkinson commented 3 years ago

As I said in #238, I don't think we should be looking for a heading at all.

We should just look for table rows of doc1 links and if they're there than they're close enough to attachment pages to be worth shipping to the server for it to parse. (Because don't have a unified parser in the client and server, and don't want to overparse in the client).

I think this analysis remains correct, 2 years later.

So, that is:

(document.querySelectorAll(`td a[href*="/doc1"], td a[href*="/docs1"]`).length > 0)

Or maybe without even the td constraint?

mlissner commented 3 years ago

I'm happy with either solution.

mlissner commented 1 year ago

I believe this is fixed via https://github.com/freelawproject/recap-chrome/pull/269