dexaai / dexter

LLM tools used in production at Dexa
MIT License
78 stars 6 forks source link

Improve install size #32

Closed transitive-bullshit closed 6 months ago

transitive-bullshit commented 6 months ago

Currently sitting at ~17MB with 14MB coming from tiktoken: https://pkg-size.dev/@dexaai%2Fdexter

See also https://github.com/dqbd/tiktoken/issues/68

For comparison, here's langchain at ~36MB: https://pkg-size.dev/langchain but we should be a lot slimmer than this. Langchain's not even loading the full tiktoken WASM lib; they're using the 6.6MB js-tiktoken.

This issue may end up just being resolved by improving tiktoken's WASM bundle size upstream, but I wanted to track it while it's top of mind for gptlint.

rileytomasek commented 6 months ago

bundle size is obv important, but do we care about bundle size? bundlephobia is showing 238k minified bundle but that seems like it has to be wrong with tiktoken.

also, have you looked in the js tokenizer libs lately? is there a better one yet?

transitive-bullshit commented 6 months ago

agreed that it's not a priority; just bringing it up because gptlint came in at 25MB and 80% of that was dexter which was surprising to me.

also, have you looked in the js tokenizer libs lately? is there a better one yet?

Not that I'm aware of; langchain is still using js-tiktoken and I haven't seen any others gain wide adoption.

rileytomasek commented 6 months ago

ok lets close this then considering it's mostly tiktoken and we don't have a good alternative.