andrei-markeev / ts2c

Convert Javascript/TypeScript to C
ISC License
1.26k stars 95 forks source link

int16_t is not enough and better string implementation needed #62

Open Naheel-Azawy opened 4 years ago

Naheel-Azawy commented 4 years ago

This sounds like multiple issues but they are all related. For the following js code:

var s = "";
for (var i = 0; i < 10000; ++i) {
    s += i;
}
console.log(s.length);
  1. The output of the compiled c code is -26646.
  2. If 10000 gets changed to 100000... Assertion 'gc_main->data != NULL' failed....

Both these issues are because int16_t is used (aka short) which is obviously not enough.

  1. However, if something like unsigned long is used, too many allocations will be done and from my experience my system (Arch Linux with 8gb of ram) froze...

I think a solution to this will be first using a something like size_t instead of int16_t. Using size_t is better I guess so that the compiler deals with it and 16-bit microcontrollers stay happy (issue #41). Then, a better mini string implementation should be done. Many libraries already exist out there. Also C++ implementation does pretty well which I assume can be cloned to C. The following works pretty well:

int main() {
  string s = "";
  for (int i = 0; i < 1000000; ++i) {
    s += to_string(i);
  }
  cout << s << endl;
}
andrei-markeev commented 4 years ago

Yes, good points!

There will be a switch at some point so that it is easy to change int16_t to int32_t or int64_t. @pitust has tried changing everything to int64_t and was working fine for his scenario, so I hope this can be solved without much problems: https://github.com/andrei-markeev/ts2c/issues/26#issuecomment-569975483

Regarding the string implementation: yes, right now strings are const char *. Even though your example is not very realistic, I agree that current implementation is certainly not the best option if there is a massive amount of string operations. So maybe we can detect these cases and use char * instead, preallocated to bigger capacity similarly to how arrays are implemented.