LibreDWG / libredwg

Official mirror of libredwg. With CI hooks and nightly releases. PR's ok
https://savannah.gnu.org/projects/libredwg/
GNU General Public License v3.0
931 stars 228 forks source link

Bugs in text splitting for DXF output #986

Open vagran opened 2 months ago

vagran commented 2 months ago

Invalid DXF file is produced when trying to convert DWG to DXF using dwg2dxf. Long text (usually in group 1) is split incorrectly. The continuation line does not have a preceding group code. Additionally unicode code points are split between the lines. Here is an example fragment of such result (sorry, I cannot share the full source file, it is proprietary):

XRECORD
  5
E331
102
{ACAD_REACTORS
330
E2CB
102
}
330
E2CB
100
AcDbXrecord
280
     1
 40
401321167
  1
{\fCalibri|b1|i0|c0|p34;\L3\fCalibri|b1|i0|c161|p34;.     \fCalibri|b1|i0|c0|p34;ΟΡΟΙ ΔΟΜΗΣΗΣ\P\fCalibri|b0|i0|c0|p34;\l \P  \fCalibri|b1|i0|c0|p34; 3.1 ΞΕΝΟΔΟΧΕΙΑΚΕΣ ΕΓΚΑΤΑΣΤΑΣΕΙΣ - ΕΚΤΟΣ ΣΧΕΔΙΟΥ\fCalibri|
b1|i0|c161|p34;\L\P\pxi-20.507,l23,t5.4931,2  # <<<< continuation without tag!
  1
3,24;\fCalibri|b0|i0|c161|p34;\l1.  ΘΕΣΗ ΓΗΠΕΔΟΥ        \fCalibri|b1|i0|c161|p34;\L\P\fCalibri|b0|i0|c161|p34;\l2.  ΝΟΜΟΙ & ΔΙΑΤΑΓΜΑΤΑ          \P\pi0,l0,tz;\P\P\P\P\P\P\pi-20.507,l23,t5.4931,23,24;3.    ΑΡΤΙΟΤΗΤΑ\P4.   ΣΥΝΤΕΛ?
?ΣΤΗΣ ΔΟΜΗΣΗΣ \fCalibri|b0|i0|c0|p34; # <<<< continuation without tag! unicode symbol broken, part is left on the  previous line.

Looking into the code, I suspect several problems: https://github.com/LibreDWG/libredwg/blob/07c078aca71840f0f9a0dffb3032056d043858b0/src/out_dxf.c#L1270

  1. Continuation group code is written only if remaining length is greater than 255. It is probably a typo, and it should check for greater than 0, like in the block above it.
  2. Unicode is not handled in any way. Code point may be split at an arbitrary byte.
  3. According to DXF specification text length limit is 250 characters, not 255.
  4. According to DXF specification, if text field is split (group 1), all partial fragments (starting from the first one) should have group 3, and should be terminated by last fragment with group 1. Seems there is no place for such logic in the current implementation.
  5. This 1024 bytes limit looks very bad. Shouldn't it be increased with dynamic buffer allocation?
rurban commented 2 months ago

Yep, you nailed it. The splitter is very naive