Open CGamesPlay opened 7 months ago
Hey @CGamesPlay, we have both cached as well. What would a good lazy loading implementation here look like?
Yes, they are cached... at startup. This loading process takes time, though, which is what would be great to improve. To see what I mean: time python -c 'import litellm'
. Any program that uses litellm at all takes at least this amount of time to start up, even if it never calls a single litellm function (e.g. mytool --help
has to pay this startup cost).
The fixes should be pretty simple, I think, but I haven't tested these.
# Instead of
encoding = tiktoken.get_encoding("cl100k_base")
# Use
import functools
@functools.cache
def get_encoding():
return tiktoken.get_encoding("cl100k_base")
# Instead of
model_cost = get_model_cost_map(url=model_cost_map_url)
# Use
import functools
@functools.cache
def get_model_cost():
# In reality, we probably want to refactor the existing method to use a default parameter.
# Fun fact: functools.cache methods have a property `__wrapped__` which is the uncached function.
return get_model_cost_map(url=model_cost_map_url)
I took a deeper look at this and tried to create a PR, but failed. model_cost
is a public export. It's possible to make it lazy-loaded at use by subclassing dict and only loading once a key is accessed, except a bunch of other public exports also expect the model cost map to be loaded at startup, notably lists, which can't be lazy-loaded the same way. As a workaround, I could set LITELLM_LOCAL_MODEL_COST_MAP=True
before I import litellm
.
I've also discovered that litellm always calls dotenv.load_dotenv()
with the default configuration, meaning my application cannot disable this behavior if, say, the user requested a different env file be loaded.
Considering all these, I unfortunately think litellm is unsuitable for my use case, which is a CLI application that leverages LLMs, so I'll be migrating away from it. Hopefully in the next major version (since these changes would require a major version bump) litellm will be less demanding of its execution environment (not modify os.environ, not make network requests on startup, do less unnecessary loading).
Hey @CGamesPlay, reviewing the feedback:
not make network requests on startup
I believe LITELLM_LOCAL_MODEL_COST_MAP
should solve this. Let me know if not?
not modify os.environ
Can you explain this a bit - how are we modifying the os.environ?
do less unnecessary loading
What would this look like?
Sure thing. As I mentioned, my use case is a CLI utility. People will be using this from all over their system, in any directory. That said:
LITELLM_LOCAL_MODEL_COST_MAP
myself, in my program, before I import litellm
, or risk delaying startup by 5s. That's not ideal, but it is a workaround.dotenv.load_dotenv()
, loading whatever random .env
file is in whatever random directory the user runs my script in, modifying os.environ
.get_model_cost_map
or tiktoken.get_encoding
at the top level. This would look like time python -c 'import litellm'
giving results closer to time python -c 'import openai'
(on my system, this is presently 700ms vs 350ms, so double the startup time). I gave some specific examples in my previous comment.No ill will towards you guys, and I would love to see a unified LLM interface in python to allow me to easily swap out model providers, but as it stands litellm won't be that for my project.
I too am building cli utility and noticed that litellm was the main culprit for my slow startup time. Including utility supposed to quickly listen to your voice.
I am surprised to hear that it's in good part caused by the cost loading code as that to me seems like something that should not be run on every startup but when needed (lazy loading).
It's possible to make it lazy-loaded at use by subclassing dict and only loading once a key is accessed,
That was my first thought as well.
except a bunch of other public exports also expect the model cost map to be loaded at startup, notably lists, which can't be lazy-loaded the same way.
Aouch. That does not seem to be super easy to change but does seem worth it to me. I mean litellm is a complete outlier compared to other libs:
❯ time python python 0,09s user 0,06s system 8% cpu 1,669 total ❯ time python -c "import langchain ; import langchain_core ; import langchain_community" python -c "import langchain" 0,14s user 0,06s system 100% cpu 0,193 total ❯ time python -c "import numpy" python -c "import numpy" 0,65s user 0,61s system 326% cpu 0,388 total ❯ time python -c 'import openai' python -c 'import openai' 1,48s user 0,18s system 99% cpu 1,669 tota ❯ time python -c "import torch" python -c "import torch" 2,73s user 0,88s system 124% cpu 2,891 total ❯ time python -c "import torchaudio" python -c "import torchaudio" 3,66s user 1,32s system 141% cpu 3,521 total ❯ time python -c "import litellm" python -c "import litellm" 4,27s user 0,96s system 116% cpu 4,473 total
So Litellm is 27 times slower to import than langchain, 7 times slower than numpy, 2.8 times slower than openai, 1.6 times slower than torch, 1.2 times slower than torchvision.
I picked these libs pretty randomly but that certainly seems unecessary. I can understand why that would be a deal breaker for @CGamesPlay and for many corporation.
I am still very fond of litellm and use it in probably at least a dozen of my repos but I think this should be of pretty high priority. I didn't look at the code but intuitively, lazy loading of costs when the dict is accessed seems like the way to go. I don't know for what reasons some other parts of the code would need the cost dict on startup.
hey @thiswillbeyourgithub can you check if this is resolved by just using the locally cached dictionary -
export LITELLM_LOCAL_MODEL_COST_MAP="True"
https://docs.litellm.ai/docs/completion/token_usage#9-register_model
Hi, I'm sorry I somehow missed your message.
The answer is no, both are still slow:
❯ time python -c "import litellm"
python -c "import litellm" 4,49s user 1,00s system 112% cpu 4,897 total
❯ export LITELLM_LOCAL_MODEL_COST_MAP="True"
❯ time python -c "import litellm"
python -c "import litellm" 4,48s user 1,02s system 118% cpu 4,655 total
litellm version: 1.39.6
Any update here? It is pretty bad form to be making network requests at package import time.
@lucasgadams this can be disabled with an env var -
the hosted copy is used to prevent needing to make continuous version upgrades as new models come out
The Feature
Litellm has a few heavy initialization functions that are not used 100% of the time but add considerable startup time to the package. It would be great to modify these to be lazy-loaded. Two specific resources I've noticed:
Motivation, pitch
Faster startup times will 1000x developer productivity while revolutionizing new uses cases.
Twitter / LinkedIn details
No response