TomRoush / PdfBox-Android

The Apache PdfBox project ported to work on Android
Apache License 2.0
1.03k stars 262 forks source link

PdfBox-Android: No valid object at given location 9409516PdfBox-Android: No valid object at given location 9409516 #534

Open Twinkle-Wong opened 1 year ago

Twinkle-Wong commented 1 year ago

PdfBox-Android: No valid object at given location 9409516 - ignoring java.io.IOException: Error: Expected a long type at offset 9409516, instead got ''

THausherr commented 1 year ago

Please include your file.

Twinkle-Wong commented 1 year ago

PdfBox-Android D No valid object at given location 9409516 - ignoring java.io.IOException: Error: Expected a long type at offset 9409516, instead got '' at com.tom_roush.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1345) at com.tom_roush.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1270) at com.tom_roush.pdfbox.pdfparser.COSParser.findObjectKey(COSParser.java:1595) at com.tom_roush.pdfbox.pdfparser.COSParser.validateXrefOffsets(COSParser.java:1472) at com.tom_roush.pdfbox.pdfparser.COSParser.checkXrefOffsets(COSParser.java:1524) at com.tom_roush.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:409) at com.tom_roush.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:254) at com.tom_roush.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:169) at com.tom_roush.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:222) at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1096) at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1079) at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1020) at com.shelter.sciencebookdecrypt.MainActivity$DecryptPdfTask.doInBackground(MainActivity.java:250) at com.shelter.sciencebookdecrypt.MainActivity$DecryptPdfTask.doInBackground(MainActivity.java:87) at android.os.AsyncTask$3.call(AsyncTask.java:378) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:289) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:919) Caused by: java.lang.NumberFormatException: For input string: "" at java.lang.Long.parseLong(Long.java:606) at java.lang.Long.parseLong(Long.java:636) at com.tom_roush.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1339) at com.tom_roush.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1270)  at com.tom_roush.pdfbox.pdfparser.COSParser.findObjectKey(COSParser.java:1595)  at com.tom_roush.pdfbox.pdfparser.COSParser.validateXrefOffsets(COSParser.java:1472)  at com.tom_roush.pdfbox.pdfparser.COSParser.checkXrefOffsets(COSParser.java:1524)  at com.tom_roush.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:409)  at com.tom_roush.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:254)  at com.tom_roush.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:169)  at com.tom_roush.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:222)  at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1096)  at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1079)  at com.tom_roush.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1020)  at com.shelter.sciencebookdecrypt.MainActivity$DecryptPdfTask.doInBackground(MainActivity.java:250)  at com.shelter.sciencebookdecrypt.MainActivity$DecryptPdfTask.doInBackground(MainActivity.java:87)  at android.os.AsyncTask$3.call(AsyncTask.java:378)  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:289)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)  at java.lang.Thread.run(Thread.java:919)  Main: D onActivityResult: com.tom_roush.pdfbox.pdmodel.encryption.InvalidPasswordException: Cannot decrypt PDF, the password is incorrect

Twinkle-Wong commented 1 year ago

the kind of password is hex :9E21C2E741A4910AE284888670DABD7A

bb.pdf

Twinkle-Wong commented 1 year ago

I am certain that this key is correct because it was successfully decrypted using the pikepdf library in Python

Twinkle-Wong commented 1 year ago

The source code for Python is

from pikepdf import Pdf

out = open("11.pdf", "wb")
file_key = "9E21C2E741A4910AE284888670DABD7A"
Pdf.open("bb.pdf", password=file_key, hex_password=True).save(out)
out.close()
THausherr commented 1 year ago

PDFBox for desktop also fails to open this file: "IOException: No security handler for filter TTKN.PubSec".

THausherr commented 1 year ago

I'm not with this project, I'm with the desktop PDF project, I just hang around in other PDF projects too. tilman at snafu dot de

Twinkle-Wong commented 1 year ago

I'm just trying to work this out with you, and it has very little to do with desktop or mobile

Twinkle-Wong commented 1 year ago

I'm extremely sorry, the format of the PDF file I provided you above is not fixed. I will upload the file named output.pdf again. Could you please try decrypting this PDF file for me again

Twinkle-Wong commented 1 year ago

output.pdf

THausherr commented 1 year ago

I can't open it with PDFBox for desktop or PDF.js because this is a binary password. According to a converter the password is "!ÂçA¤‘ ℈†pÚ½z" which isn't really helpful.

Twinkle-Wong commented 1 year ago

The key I provided is a hexadecimal key, can't pdfbox decrypt pdf files using a hexadecimal key

THausherr commented 1 year ago

I didn't try because PDFDebugger doesn't support it and I didn't want to write code just for that. What would a person do that has to enter the key in Adobe Reader? If you are the one generating this file, why not generate it with a readable password?

Twinkle-Wong commented 1 year ago

This key is an RSA key and is not the key you need to enter to open the file using adobe pdf. In addition, this file is not generated by me, it is now an ordinary RSA encrypted file, the previous filter problem has been solved by me, if possible, please help to try to use the pdfbox library and AES decryption related api to decrypt the file, thank you very much

THausherr commented 1 year ago

Lets say I'm an ordinary user who knows only Adobe Reader. What would I do with that file?

THausherr commented 1 year ago

According to the PDF specification "Algorithm 2: Computing an encryption key in order to encrypt a document (revision 4 and earlier)": "The password string is generated from host system codepage characters (or system scripts) by first converting the string to PDFDocEncoding. If the input is Unicode, first convert to a codepage encoding, and then to PDFDocEncoding for backward compatibility."

Your "password" doesn't look like "host system codepage characters (or system scripts)".

(Your PDF uses revision 4. It gets more complex with later revisions, then SASLPrep is needed)

Twinkle-Wong commented 1 year ago

Anyway, I only have this one key, and I'm 100% sure that this key is viable and correct, right