LibreDWG / libredwg

Official mirror of libredwg. With CI hooks and nightly releases. PR's ok
https://savannah.gnu.org/projects/libredwg/
GNU General Public License v3.0
982 stars 233 forks source link

Bugs in text splitting for DXF output #986

Open vagran opened 5 months ago

vagran commented 5 months ago

Invalid DXF file is produced when trying to convert DWG to DXF using dwg2dxf. Long text (usually in group 1) is split incorrectly. The continuation line does not have a preceding group code. Additionally unicode code points are split between the lines. Here is an example fragment of such result (sorry, I cannot share the full source file, it is proprietary):

XRECORD
  5
E331
102
{ACAD_REACTORS
330
E2CB
102
}
330
E2CB
100
AcDbXrecord
280
     1
 40
401321167
  1
{\fCalibri|b1|i0|c0|p34;\L3\fCalibri|b1|i0|c161|p34;.     \fCalibri|b1|i0|c0|p34;ΟΡΟΙ ΔΟΜΗΣΗΣ\P\fCalibri|b0|i0|c0|p34;\l \P  \fCalibri|b1|i0|c0|p34; 3.1 ΞΕΝΟΔΟΧΕΙΑΚΕΣ ΕΓΚΑΤΑΣΤΑΣΕΙΣ - ΕΚΤΟΣ ΣΧΕΔΙΟΥ\fCalibri|
b1|i0|c161|p34;\L\P\pxi-20.507,l23,t5.4931,2  # <<<< continuation without tag!
  1
3,24;\fCalibri|b0|i0|c161|p34;\l1.  ΘΕΣΗ ΓΗΠΕΔΟΥ        \fCalibri|b1|i0|c161|p34;\L\P\fCalibri|b0|i0|c161|p34;\l2.  ΝΟΜΟΙ & ΔΙΑΤΑΓΜΑΤΑ          \P\pi0,l0,tz;\P\P\P\P\P\P\pi-20.507,l23,t5.4931,23,24;3.    ΑΡΤΙΟΤΗΤΑ\P4.   ΣΥΝΤΕΛ?
?ΣΤΗΣ ΔΟΜΗΣΗΣ \fCalibri|b0|i0|c0|p34; # <<<< continuation without tag! unicode symbol broken, part is left on the  previous line.

Looking into the code, I suspect several problems: https://github.com/LibreDWG/libredwg/blob/07c078aca71840f0f9a0dffb3032056d043858b0/src/out_dxf.c#L1270

rurban commented 5 months ago

Yep, you nailed it. The splitter is very naive

rurban commented 1 month ago

Fixed 1,3,4,5 so far. Proper unicode rune splitting seems to be implemented by transformation to UCS-2, and transformed back to UTF-8.