empira / PDFsharp

PDFsharp and MigraDoc Foundation for .NET 6 and .NET Framework
https://docs.pdfsharp.net/
Other
492 stars 114 forks source link

LzwDecode might omit characters #176

Open stefan6419846 opened 6 days ago

stefan6419846 commented 6 days ago

When doing a code review of the LZW implementation inside this repository, I stumbled upon a possible issue for special cases in the LZW algorithm where dictionary lookups would be done before the corresponding entry has been added, corresponding to the https://github.com/empira/PDFsharp/blob/5fbf6ed14740bc4e16786816882d32e43af3ff5d/src/foundation/src/PDFsharp/src/PdfSharp/Pdf.Filters/LzwDecode.cs#L64-L69 branch.

In these cases (on line 66), the first character of the looked up string has to be appended to the output as well, as done when adding the dictionary entry itself. For reference, you might want to have a look at http://web.archive.org/web/20011214082531/http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html. (Unfortunately, I cannot provide any C# code here as I am no C# developer - I just stumbled upon this issue when reviewing some port of the corresponding functionality.)