SheetJS / js-codepage

:currency_exchange: Codepages for JS
http://sheetjs.com
Apache License 2.0
151 stars 39 forks source link

The file is huge #1

Closed Vanuan closed 10 years ago

Vanuan commented 11 years ago

Is there a way to reduce the disk footprint? For example, adding separate scripts for encoding/decoding, using raw characters instead of number strings, using a minifier, etc.

As a side note, it might also be useful to introduce an efficient decoding function, e.g.:

cp.decode('ÇÈÉÊ', 1251)

It might be even possible to just use arrays. Index would be the character code. Although it would result in a waste of space for the first 127 characters, so it's comparable to using objects.

Vanuan commented 11 years ago

Of course, DBCS needs a separate treatment. I just meant to say that one byte encodings has more efficient way of storing the same data. Compare this:

var table = {
"1251":[1026,1027,8218,1107,8222,8230,8224,8225,8364,8240,1033,8249,1034,1036,1035,1039,1106,
8216,8217,8220,8221,8226,8211,8212,152,8482,1113,8250,1114,1116,1115,1119,160,1038,1118,1032,
164,1168,166,167,1025,169,1028,171,172,173,174,1031,176,177,1030,1110,1169,181,182,183,1105,
8470,1108,187,1112,1029,1109,1111,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,1050,
1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,
1069,1070,1071,1072,1073,1074,1075,1076,1077,1078,1079,1080,1081,1082,1083,1084,1085,1086,
1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100,1101,1102,1103],
...
}

to the object mapping equivalent. It is at least 4 times less. It just needs a bit of code:

function decode(string, codepage) {
  var indexes = table[codepage], decoded = "";
  for (var i = 0; i < string.length; ++i) {
    if (string[i].charCodeAt(0) < 128) {
      decoded += string[i];
    } else {
      decoded += String.fromCharCode(indexes[string[i].charCodeAt(0) - 128]);
    }
  }
  return decoded;
}
nonchalance commented 11 years ago

expanding on @Vanuan suggestion, the encode/decode should accept/return either nodejs Buffers or Strings

SheetJSDev commented 10 years ago

@Vanuan separate source files are available for individual codepages. For a tight solution, you can string them together. The utils functions from cputils.js work just as well with the individual scripts as with the monster script.