daitss / core

DAITSS: Dark Archive In The Sunshine State
GNU General Public License v3.0
9 stars 2 forks source link

Two packages with only small pdf seem to be stuck in the describe step #795

Open szanati opened 7 years ago

szanati commented 7 years ago

EFCO3RYWA_JDMBPT is for package UF00027796_00781. EZL1NQAW3_GPWAPC for package CFE0006529.

I have two packages in production that seem to be stuck in the first steps of processing. They only have a small pdf each. One package, UF00027796_00781, gets to the describe-migrate-normalize-2 step and stays there. The other package, CFE0006529, gets to only the virus check 0 step and stays there. I was able to archive the CFE0006529 on ripple in a short amount of time so maybe the newer version of PDFA pilot was the answer. UF00027796_00781 also gets stuck on ripple.

szanati commented 7 years ago

The ieid for UF00027796_00781 is EFCO3RYWA_JDMBPT. The ieid for CFE0006529 is EZL1NQAW3_GPWAPC.

cchou commented 7 years ago

The PDFs in these two packages were stuck during PDFaPilot conversion. Please refer to the email to Callas.

Also, to get this package archived at this moment, you can temporary turn off the PDFA conversion and then refresh it after Callas fix it.

szanati commented 7 years ago

I read in the email from Callas about workaround using the command line:

"There is, however, already a known workaround for that problem: Both files can actually be converted when they are first converted to PostScript and back to PDF. You can do so by using ./pdfaPilot --redistill on command line."

For our own purpose how could I use this on the command line. I tried on ripple without success. I used the following: ./pdfaPilot --redistill /var/daitss/ops/stephen/PDF_Errors/AA00047064_00008/10-2006.pdf from the /opt directory and got: ./pdfaPilot: No such file or directory. I then tried: ./pdfapilot-6.2.256 --redistill /var/daitss/ops/stephen/PDF_Errors/AA00047064_00008/10-2006.pdf and got ./pdfapilot-6.2.256: is a directory. I not sure how to use the workaround using the command line.

cchou commented 7 years ago

Stephen,

Use this command on ripple:

/opt/pdfapilot-6.2.256/pdfaPilot --redistill 09-06-2013.pdf

It will output the redistilled file named as 09-06-2013_0001.pdf.

I have put the related file in /var/tmp/problemPDFs -rw-rw-r-- 1 cchou cchou 34330467 May 8 21:12 09-06-2013_0001-o.pdf -rw-rw-r-- 1 cchou cchou 75430843 May 8 21:11 09-06-2013_0001.pdf -rw-rw-r-- 1 cchou cchou 8699082 May 8 21:11 09-06-2013.pdf

Where 09-06-2013.pdf is the original PDF, 09-06-2013_0001.pdf is the redistilled one and 09-06-2013_0001-o.pdf is the pdfa conversion on the redistilled PDF.

szanati commented 7 years ago

Another package to add AA00053408_00225 and its ieid is E4Q3JWYO1_OOLW7B.

szanati commented 7 years ago

Here is a part of an email from Carol to David at Callas regarding using the command line to convert problem pdfs to pdfaPilot:

"Thanks for the update David. I also tried redistilling the PDF, but the resulting PDF is very huge, from 53MB to 871MB. It doesn't look like redistilling would be an option for us until it's fixed. -rwxrwxr-x 1 cchou cchou 53M May 29 00:00 08-04-2016.pdf -rw-rw-r-- 1 cchou cchou 871M May 29 00:12 08-04-2016_0001.pdf

Thank you, -Carol"

szanati commented 7 years ago

The original packages for this issue: AA00053408_00225 and UF00027796_00781 are in /var/daitss/ops/exceptions/tickets/Github_795 on darchive.

lydiam commented 7 years ago

Question: when/how will we know when Callas has fixed the problem? Is there an issue number for a Callas problem reporting system that we can include in this issue?

cchou commented 7 years ago

Callas doesn't give us any bug/issue number. Usually after the support team confirm the bug, it is passed on to the development team, unfortunately they don't pass their internal development bug tracking number to us.

Since we have maintenance license with them, I would suggest FDA to upgrade to the new version of PDFApilot whenever it becomes available and run through our trouble PDFs with the latest version.

lydiam commented 7 years ago

That’s a good suggestion.

From: Carol Chou [mailto:notifications@github.com] Sent: Tuesday, June 20, 2017 10:00 PM To: daitss/core core@noreply.github.com Cc: Lydia Motyka LMotyka@flvc.org; Comment comment@noreply.github.com Subject: Re: [daitss/core] Two packages with only small pdf seem to be stuck in the describe step (#795)

Callas doesn't give us any bug/issue number. Usually after the support team confirm the bug, it is passed on to the development team, unfortunately they don't pass their internal development bug tracking number to us.

Since we have maintenance license with them, I would suggest FDA to upgrade to the new version of PDFApilot whenever it becomes available and run through our trouble PDFs with the latest version.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/daitss/core/issues/795#issuecomment-309941061, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASqrdcgDSk9QZM5LTCAk2hWY43_unYNks5sGHk3gaJpZM4NNP8q.

szanati commented 6 years ago

I have another package, CFE0006907 ieid EZDZ5UKPH_6RC47Y, that got stuck over the weekend. It was a smaller etd only 17.12MB. It ran in the stuck position from Friday until Sunday morning when it caused daitss to run low on memory. It was the only package running at the time and DAITSS to restart which finally made the package to error out. There was another batch that archived after that without problems. I tried the package on Ripple and it also got stuck. When I stopped it on Ripple, pdfa pilot kept on running when I checked running "top" on the command line. Even when I stopped DAITSS on ripple it did not kill pdfa pilot. I finally killed the pid. I then turned off pdfa pilot and resubmitted the package and it archived without problems on Ripple. I transfered a copy of the pdf to my desktop and its 154 pages with different graphics and text in it. Maybe the pdf caused pdfa pilot to choke. At this time all I can do for this ETD is to archive it with pdfa Pilot turned off.