Open mtxvp opened 6 months ago
Just this part seems to be responsible for 85MiB
from atproto_client import models
Thank you for your report. Here is why it happens: https://github.com/pydantic/pydantic/issues/6620#issuecomment-1934687870
TLDR: pydantic copies some stuff into the rust side and it consumes memory; upgrade to the latest pydantic
So nothing I can do from my side IMO. You can try to load only the necessary models for your project if the memory usage is critical.
Thank you for your report. Here is why it happens: pydantic/pydantic#6620 (comment)
TLDR: pydantic copies some stuff into the rust side and it consumes memory; upgrade to the latest pydantic
So nothing I can do from my side IMO. You can try to load only the necessary models for your project if the memory usage is critical.
Validation or having access to models is not critical at all or necessary in my use case. The less memory I can use, the better.
Given my limited understanding on how to use your library, I'd love to get some recommendations on how load only minimal amount of models, if any. All I am trying to do is to use API for posting by following your README, which suggests importing Client.
Importing Client in turn executes the following line of code, which seems to be the one of the culprits:
atproto_client/models/init.py
...
...
load_models()
Also, latest pydantic v2.6.4 is added to my project via dependencies, I have no direct use of it otherwise.
The current situation is even worse. I tried your commands and it gives me ~170mb on from atproto_client import models
.
Even when model rebuilding (in load_models
) is disabled (which breaks the code because not all models are in completed state) it consumes around 50mb just by importing all models in the model/__init__.py
.
A little about load_models
. Because of so deep, nested, and sometimes a bit recursive model structure this is important to load models in a strict order to resolve dependencies of each other. Keeping such order could be hard to achieve from our code generator. Which is a fully automated process. That's why model rebuilding happens in runtime when all the models are defined already. load_models
at the end calls model_rebuild
for each existing model but the model_rebuild
method is pretty smart to does nothing if the model is completely loaded (all types are resolved correctly). So there is no overhead.
TL;DR: I think there is no way to fix it from our side except migrating from pydantic to something else. load_models
is not a problem. The problem is 470 models which are inherited from pydantic.BaseModel
with Rust`s memory overhead for each of them. IMO only pydantic`s team can try to minimize their memory overhead
upd:
Filename: atproto/packages/atproto_client/models/models_loader.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
40 47.7 MiB 47.7 MiB 1 @profile
41 def __rebuild_all_models() -> None:
42 # load models to the scope
43 47.7 MiB 0.0 MiB 1 from atproto_client import models # noqa
44 47.7 MiB 0.0 MiB 1 from atproto_client.models.unknown_type import UnknownType, UnknownInputType
45 47.7 MiB 0.0 MiB 1 from atproto_client.models.blob_ref import BlobRef
46 47.7 MiB 0.0 MiB 1 from atproto_client.models import dot_dict
47 47.7 MiB 0.0 MiB 1 from atproto_core.cid import CIDType
48
49 47.7 MiB 0.0 MiB 1 UnknownType, UnknownInputType, CIDType, dot_dict # noqa: B018
50
51 47.7 MiB 0.0 MiB 1 BlobRef.model_rebuild()
52 148.7 MiB 0.0 MiB 466 for __model in __get_models_to_rebuild_set():
53 148.7 MiB 101.0 MiB 465 __model.model_rebuild()
pydantic`s model rebuild: https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_rebuild
I've created a separate isolated repository that contains pure models without additional noise and asked pydantic team about the amount of consumed memory here: https://github.com/pydantic/pydantic/issues/9982
Merely adding the following line to an empty python script makes memory usage to jump from 0 to 100+MiB:
Tested using the following command on WSL2, python 3.10.6:
(see memory profiler, section: Time-based memory usage)