freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Add support for combined documents #347

Open mlissner opened 11 months ago

mlissner commented 11 months ago

In #337, we're adding a warning so people know that combined documents don't get uploaded to RECAP, but in looking at it a bit more today, I'm realizing that we can actually support this correctly, if we ever want to build out this feature. I don't think we want to bother because I think the warning will be enough, but we can do it if we want to.

The problem we've always had with this feature is that if somebody uploads a combined PDF to our servers, we won't know how to split it properly. Our assumption was that we'd need to use the PACER PDF headers to do the split, that they're annoying to parse, and anyway, they're unreliable.

My realization today is that the page counts for each document are on the receipt page:

image

So you can see that for docs that are less than 30 pages, it'll tell you how many pages in the first box. For docs that are longer, it'll tell you how many pages in the second box. (This might fail if combined documents contain free opinions.)

Anyhow, what we could do, if we ever wanted to, is parse the page for these values, add them as a new parameter to our POST, and then use them on the server to do the split of the document.

mlissner commented 11 months ago

To eager developers, note that I am definitely not saying we should do this. Indeed, I'm explicitly saying we should not do this right now. You know who you are.