LibreDWG / libredwg

Official mirror of libredwg. With CI hooks and nightly releases. PR's ok
https://savannah.gnu.org/projects/libredwg/
GNU General Public License v3.0
931 stars 228 forks source link

DWG large files errors #272

Open arturred opened 3 years ago

arturred commented 3 years ago

Hi Reading these files using the current binary assert reports many errors :

Warning: checksum: 0x31b7133b (calculated) mismatch

ERROR: read_R2004_section_info out of range
Warning: Failed to find section_info[7] with type 1
ERROR: Failed to read compressed Header section
ERROR: Invalid .props x 28191
Warning: Failed to find section_info[7] with type 3
ERROR: Failed to read compressed Classes section
Warning: Skip empty section 2329 AcDb:AcDbObjects
ERROR: Invalid opcode 0x0 in input stream at pos 294
ERROR: Failed to read compressed AcDbObjects section
Warning: Failed to find section_info[7] with type 2
ERROR: Failed to read uncompressed AuxHeader section
ERROR: Preview overflow > 29067
Warning: thumbnail.size mismatch: 29071 != 0
ERROR: Some section size or address out of bounds
ERROR: Template section not found
...

Please download samples from https://easyupload.io/jvzytl Teigha or AutoCAD can read them. If I convert them to other dwg formats (lower or higher), they open fine. This seems to be a file specific issue.

rurban commented 3 years ago

Yes, this is the known section map bug #144

arturred commented 3 years ago

Thanks for the info. This seems to be a hard bug for a year. I've also tested libdxfrw trying to fix it (using your suggestions for variables overflow) but no luck so far. I either get duplicated ids of a page map section or invalid addresses outside the buffer range. In other files, the page map seems to be correct but reading section info fails. The problem is not only overflow values but also the decompressed buffer that may contain gaps (negative page id?). No idea so I hope that you will figure this out.

rurban commented 3 years ago

Yes, a hard one. The more failing dwg examples, the better to figure out the scheme. In principle it needs a big dwg and then delete many entities, which causes the gaps.

markstock commented 3 years ago

I think I found a ton of files that fail in this same manner: https://www.3drotterdam.nl/downloads/#/

From HEAD build, Fedora 29, GCC 8.3.1:

curl -O https://www.3drotterdam.nl/downloads/global/download//DWG/Rotterdam_Centrum.zip
unzip Rotterdam_Centrum.zip
dwgread -O GeoJSON -o Cool.json Rotterdam_Centrum/Bomen/Cool.dwg
Warning: checksum: 0x28e2125d (calculated) mismatch

ERROR: Skip section A with size 89 > 1 * 0
ERROR: read_R2004_section_info out of range
Warning: Failed to find section_info[7] with type 1
ERROR: Failed to read compressed Header section
Warning: Failed to find section_info[7] with type 3
ERROR: Failed to read compressed Classes section
Warning: Failed to find section_info[7] with type 4
ERROR: Failed to read compressed Handles section
Warning: Failed to find section_info[7] with type 2
ERROR: Failed to read uncompressed AuxHeader section
ERROR: Preview overflow
ERROR: Invalid product_checksum size 16. Need min. 16 bits, have 65280 for .
ERROR: Template section not found

ERROR: Failed to decode file: Cool.dwg 0x941

ERROR 0x941
rurban commented 3 years ago

This seems to be a good example, thanks. No deleted pages, just a corrupt section_info[6] out of thin air. Interesting

no-such-user commented 2 years ago

It seems that many files have the checksum and other issues, including some that ship with libredwg:

root@695c816fa3f9:/libredwg/test/test-data# ../../programs/dwgread --format json example_2010.dwg 2>&1 | grep "Warning: checksum"
Warning: checksum: 0x2edd12f6 (calculated) mismatch
root@695c816fa3f9:/libredwg/test/test-data# ../../programs/dwgread --format json example_2013.dwg 2>&1 | grep "Warning: checksum"
Warning: checksum: 0x2c7512b9 (calculated) mismatch
root@695c816fa3f9:/libredwg/test/test-data# ../../programs/dwgread --format json example_2018.dwg 2>&1 | grep "Warning: checksum"
Warning: checksum: 0x2d1512c9 (calculated) mismatch
root@695c816fa3f9:/libredwg/test/test-data# ../../programs/dwgread --format json sample_2018.dwg 2>&1 | grep "Warning: checksum"
Warning: checksum: 0x2845124f (calculated) mismatch

And about a third of the sample files I am using to test.

How much does this impact the ability to extract text from the file? Are we going to miss any sections do to this issue?

FishOrBear commented 2 years ago

Large files will cause all the text to be garbled, Is there a way to solve?

rurban commented 2 years ago

FishOrBear @.***> schrieb am Fr., 1. Apr. 2022, 10:09:

Large files will cause all the text to be garbled, Is there a way to solve?

Only, if many objects have been deleted. no way, as of yet

Reply to this email directly, view it on GitHub https://github.com/LibreDWG/libredwg/issues/272#issuecomment-1085571981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKGULVSETJNYJWY5U76ZTVC2VMDANCNFSM4SQP347Q . You are receiving this because you were assigned.Message ID: @.***>

FishOrBear commented 2 years ago

FishOrBear @.> schrieb am Fr., 1. Apr. 2022, 10:09: Large files will cause all the text to be garbled, Is there a way to solve? Only, if many objects have been deleted. no way, as of yet — Reply to this email directly, view it on GitHub <#272 (comment)>, or unsubscribe <github.com/notifications/unsubscribe-auth/AAAKGULVSETJNYJWY5U76ZTVC2VMDANCNFSM4SQP347Q> . You are receiving this because you were assigned.Message ID: @.>

Why use dwggrep.exe to read without garbled characters?