LibreDWG / libredwg

Official mirror of libredwg. With CI hooks and nightly releases. PR's ok
https://savannah.gnu.org/projects/libredwg/
GNU General Public License v3.0
989 stars 234 forks source link

maybe convert strings to a struct #245

Open rurban opened 4 years ago

rurban commented 4 years ago

We really need the encoding of the string being stored, to avoid unneeded conversions (injson/indxf e.g. r2007 -> r2000), and while we are there we can also store the length then.

typedef struct _dwg_string {
  char *str;
  BITCODE_RS len;
  // encoding
  BITCODE_RS cp; // 30: ANSI_1252, 43: UTF-16, 0: UTF-8. max: 44
} Dwg_String;

cp being dwg->header.codepage (eg 30) or 0: UTF8 (indxf + injson) or 43 UTF-16 (TU) or CP_DWG (lookup dwg->header.codepage). Then we can add iconv conversions also. So far only US_ASCII and Latin-1 (cp 30) conversions are done properly.

But better embed the string at the end of the string and pass around the ptr to this.

#pragma pack(1)
typedef struct _dwg_string {
  BITCODE_RS len;
  // encoding
  BITCODE_RS cp; // 30: ANSI_1252, 43: UTF-16, 0: UTF-8. max: 44
  char* str[];
} Dwg_String;
typedef char *BITCODE_T;

BITCODE_T *tstr_new (char* str)
{
  Dwg_String *tstr = malloc (9 + strlen (str));
  tstr->len = strlen (str);
  return &tstr->str;
}

The other idea is to use UTF-8 everywhere and convert on the fly.

rurban commented 4 years ago

The other idea is to store all strings as UTF8. Convert from the codepage on decode, convert to on encode.