Closed divergentdave closed 7 years ago
Wow. Really nice solution.
Well, mostly. I'm still getting errors on a smaller number of CRS report PDFs. Some inline images have too many bytes of binary data, so the end of that shows up when I was expecting an EI. Looking at other implementations, it seems I might have to scan for the correct "whitespace, EI, whitespace, any ASCII" sequence to resynchronize with the content stream.
Okay, this is good to go now. There were a total of six CRS reports where inline images didn't have the right length. (half longer, half shorter) There was even a 1x1 image with zero bytes of image data! This search algorithm is in line with what Poppler and pdf.js do.
Huh. Wow. Ok merging!
This handles inline images in content streams. Each combination of color space, compression filter, and encoding filter will require special handling to determine how long the image data is. For now, this just supports uncompressed 1-bit image masks.