Closed sentry-io[bot] closed 1 year ago
I verified that this error can happen when we send a non-pdf with content type as pdf. We don't confirm the filetype if it's provided, and so PdfReader()
blows up.
import requests
filename = "/path/to/jpeg/file"
import requests
res = requests.post(
"http://localhost:8000/general/v0/general",
files={"files": (filename, open(filename, "rb"), "application/pdf")},
)
print(res.text)
# {"detail":"Stream has ended unexpectedly"}
PyPDF logs a warning:
WARNING:pypdf._reader:invalid pdf header: b'\xff\xd8\xff\xe1\x9b'
WARNING:pypdf._reader:EOF marker not found
so the right thing to do here is catch that error and return a 400 with friendly message?
Yep! The bug squash is off to a good start!
API users are hitting this error on certain files.