albertodemichelis / squirrel

Official repository for the programming language Squirrel
http://www.squirrel-lang.org
MIT License
894 stars 148 forks source link

Unexpected performance boost if preallocate scratchpad ? #262

Open mingodad opened 1 year ago

mingodad commented 1 year ago

When doing tests with string concatenation I found that somehow if we preallocate a large scratchpad (I'm guessing here) the time needed to concatenate strings goes down to 1/4. Without print(blob(1000000).write("hi"));:

/usr/bin/time sq joinStrings.nut 
strPlusStr took 840000  6.300323
strToArrayPush took 840000  0.0073100000000004
2.63user 3.67system 0:06.31elapsed 99%CPU (0avgtext+0avgdata 7036maxresident)k
0inputs+0outputs (0major+2507868minor)pagefaults 0swaps

With print(blob(1000000).write("hi"));:

/usr/bin/time sq joinStrings.nut 
2
strPlusStr took 840000  1.599949
strToArrayPush took 840000  0.007053
1.40user 0.20system 0:01.61elapsed 99%CPU (0avgtext+0avgdata 7392maxresident)k
0inputs+0outputs (0major+140519minor)pagefaults 0swaps
local str1 = "abc";
local str2 = "defghijklmnñopqrstuvwxyz";

// it seems that enlarging the scratchpad beforehand improve performance of string concatenation
print(blob(1000000).write("hi"));

function strPlusStr(times) {
  local str = "";
  for(local i =0; i < times; ++i) {
    str += str1 +  str2;
  }
  return str;
}

function strPlusStrBuf(times) {
  local str = blob();
  for(local i =0; i < times; ++i) {
    str.write(str1);
    str.write(str2);
  }
  return str.tostring();
}

function strToArrayPush(times) {
  local str = [];
  for(local i=0; i < times; ++i) {
    str.append(str1);
    str.append(str2);
  }
  return str.join("");
}

function test(times) {
  local time, str;
/*
  time = os.clock();
  str = strPlusStrBuf(times); //with this call before strPlusStr then strPlusStr takes 1/4 of time
  print("strPlusStrBuf took", str.len(), os.clock()-time)
*/
  time = os.clock();
  str = strPlusStr(times);
  print("strPlusStr took", str.len(), os.clock()-time)

  time = os.clock();
  str = strToArrayPush(times);
  print("strToArrayPush took", str.len(), os.clock()-time)

}

test(30000);
zeromus commented 1 year ago

Two problems here

  1. SQSharedState::GetScratchPad needs to expand geometrically instead of only the the barely-needed size. If that happened, the "performance boost" would be so small that it would be hard to measure
  2. SQVM::StringCat could be optimized so that the scratchpad isn't needed. We would do this by hashing the source strings and combining it, then adding a new SQStringTable::Add() which can receive the hash already calculated. This would DRAMATICALLY speed up the usecase of concatenating characters one at a time, which is not an uncommon job
albertodemichelis commented 1 year ago
  1. is very environment specific, ask the ppl that run on a arm cortex M3 with 256Kb of ram and they will tell you something else.
  2. is a very interesting idea that could eliminate the need of the scratchpad in string concatenation
albertodemichelis commented 1 year ago

I did a very quick implementation of solution number 2 and the test suite passes, I'd have to do some benchmark but I'd assume is faster.