alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
MIT License
528 stars 20 forks source link

fix arraybuffer error in Node.js > 18 #37

Open Vectorrent opened 1 week ago

Vectorrent commented 1 week ago

I tried to use TokenMonster in Node 18, and it failed with this error:

/home/crow/Repos/ode/src/tokenizers/tokenmonster.cjs:205
        const dataView = new DataView(buffer)
                         ^

TypeError: First argument to DataView constructor must be an ArrayBuffer
    at new DataView (<anonymous>)
    at TokenMonster.load (/home/crow/Repos/ode/src/tokenizers/tokenmonster.cjs:205:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async TokenMonsterTokenizer.init (file:///home/crow/Repos/ode/src/tokenizers.js:144:9)
    at async OmnipotentDeterministicEngine.preInit (file:///home/crow/Repos/ode/src/model.v0.js:53:13)
    at async OmnipotentDeterministicEngine.init (file:///home/crow/Repos/ode/src/model.v0.js:59:9)
    at async orchestrate (file:///home/crow/Repos/ode/cli.js:73:9)
    at async file:///home/crow/Repos/ode/cli.js:130:5

Node.js v18.20.3

The fix was fairly simple; if we cast the URL buffer to an ArrayBuffer, this works just fine.

Thanks for building an awesome tokenizer! I'm very excited to get started here.