ansani / Shareaza

Shareaza is a peer-to-peer client for Windows that allows you to download any file-type found on several popular P2P networks.
http://shareaza.sf.net
26 stars 3 forks source link

Some PDF files interrupt Shareaza's hashing process as it attempts to scan them #92

Closed abolibibelot1980 closed 1 year ago

abolibibelot1980 commented 1 year ago

Some PDF files interrupt Shareaza's hashing process and trigger a high CPU load as it attempts to scan them. I couldn't say for sure what distinguishes those files from other PDF files which are scanned with no issue, but I can see when examining them with WinHex that they have a distinctive structure in their header, with "/Widths" followed by a long list of numbers. It has nothing to do with their sheer size as files causing that issue are usually very small.

I fixed this by disabling the option "Library.ScanPDF" (which is more like a workaround rather than a fix, since I suppose that it prevents any PDF file I upload from being previewed — not a problem to me, but it could be a problem for someone actively sharing comics in PDF for instance, who would be willing to show a glimpse to potential downloaders so as to ensure that they're indeed what they're supposed to be).

I'm attaching 2 such files (if possible they should be removed as soon as they've been downloaded by a developer as they're private documents someone mistakenly included in their shares, nothing sensitive, just regular bills, but I wouldn't want to contribute to the “leak”, so to speak ! :^p).

[ FILES REMOVED FOR PRIVACY ]

ansani commented 1 year ago

Hi @abolibibelot1980 ! Can you confirm that the same issue happens if you select one of two files (I'm doing tests with the 525_*.pdf) and click on the "Refresh Metadata" option (Library -> Select the file -> Left Click -> Refresh Metadata)?

I found the issue in this function: bool CLibraryBuilderInternals::ReadPDF(DWORD nIndex, HANDLE hFile, LPCTSTR pszPath)

The issue is related to a SAP generated document. This is the "loop of the death" string: L"Producer (SAP NetWeaver 740 ) %SAPinfoStart TOA_DARA %FUNCTION=( ) %MANDANT=( ) %DEL_DATE=( ) %SAP_OBJECT=( ) %AR_OBJECT=( ) %OBJECT_ID=( ...

When the Metadata Object of a PDF document contains a value with no end ")" the ReadPDF function hangs.

abolibibelot1980 commented 1 year ago

Should I test with the currently installed version (still 2.7.10.2) or with your latest release (I have yet to test any of them) ?

Glad that you could pinpoint the issue so quickly — even if the explanation is way beyond my current knowledge level ! And glad that you're putting some really good work into this utility which many would consider antiquated nowadays (peer-to-peer in general has fallen from grace with the advent of so-called “file lockers” and streaming), even though it is still very much active and one of the best available resources to get rare contents in niche or untrendy genres.

ansani commented 1 year ago

Hi! You can use artifacts from this Action: https://github.com/ansani/Shareaza/actions/runs/3793781999 to test the fix. I will release an official beta in a few days.