MarshalX / atproto

The AT Protocol (🦋 Bluesky) SDK for Python 🐍
https://atproto.blue
MIT License
287 stars 32 forks source link

High memory usage: from atproto import Client #316

Open mtxvp opened 6 months ago

mtxvp commented 6 months ago

Merely adding the following line to an empty python script makes memory usage to jump from 0 to 100+MiB:

from atproto import Client

Tested using the following command on WSL2, python 3.10.6:

mprof run python -m experiments.profilecode   &&  mprof plot -o image.png

(see memory profiler, section: Time-based memory usage)

image

mtxvp commented 6 months ago

Just this part seems to be responsible for 85MiB

from atproto_client import models
MarshalX commented 6 months ago

Thank you for your report. Here is why it happens: https://github.com/pydantic/pydantic/issues/6620#issuecomment-1934687870

TLDR: pydantic copies some stuff into the rust side and it consumes memory; upgrade to the latest pydantic

So nothing I can do from my side IMO. You can try to load only the necessary models for your project if the memory usage is critical.

mtxvp commented 6 months ago

Thank you for your report. Here is why it happens: pydantic/pydantic#6620 (comment)

TLDR: pydantic copies some stuff into the rust side and it consumes memory; upgrade to the latest pydantic

So nothing I can do from my side IMO. You can try to load only the necessary models for your project if the memory usage is critical.

Validation or having access to models is not critical at all or necessary in my use case. The less memory I can use, the better.

Given my limited understanding on how to use your library, I'd love to get some recommendations on how load only minimal amount of models, if any. All I am trying to do is to use API for posting by following your README, which suggests importing Client.

Importing Client in turn executes the following line of code, which seems to be the one of the culprits:

atproto_client/models/init.py

...
...
load_models()

Also, latest pydantic v2.6.4 is added to my project via dependencies, I have no direct use of it otherwise.

MarshalX commented 2 months ago

The current situation is even worse. I tried your commands and it gives me ~170mb on from atproto_client import models.

Even when model rebuilding (in load_models) is disabled (which breaks the code because not all models are in completed state) it consumes around 50mb just by importing all models in the model/__init__.py.

A little about load_models. Because of so deep, nested, and sometimes a bit recursive model structure this is important to load models in a strict order to resolve dependencies of each other. Keeping such order could be hard to achieve from our code generator. Which is a fully automated process. That's why model rebuilding happens in runtime when all the models are defined already. load_models at the end calls model_rebuild for each existing model but the model_rebuild method is pretty smart to does nothing if the model is completely loaded (all types are resolved correctly). So there is no overhead.

TL;DR: I think there is no way to fix it from our side except migrating from pydantic to something else. load_models is not a problem. The problem is 470 models which are inherited from pydantic.BaseModel with Rust`s memory overhead for each of them. IMO only pydantic`s team can try to minimize their memory overhead

upd:

Filename: atproto/packages/atproto_client/models/models_loader.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    40     47.7 MiB     47.7 MiB           1   @profile
    41                                         def __rebuild_all_models() -> None:
    42                                             # load models to the scope
    43     47.7 MiB      0.0 MiB           1       from atproto_client import models  # noqa
    44     47.7 MiB      0.0 MiB           1       from atproto_client.models.unknown_type import UnknownType, UnknownInputType
    45     47.7 MiB      0.0 MiB           1       from atproto_client.models.blob_ref import BlobRef
    46     47.7 MiB      0.0 MiB           1       from atproto_client.models import dot_dict
    47     47.7 MiB      0.0 MiB           1       from atproto_core.cid import CIDType
    48                                         
    49     47.7 MiB      0.0 MiB           1       UnknownType, UnknownInputType, CIDType, dot_dict  # noqa: B018
    50                                         
    51     47.7 MiB      0.0 MiB           1       BlobRef.model_rebuild()
    52    148.7 MiB      0.0 MiB         466       for __model in __get_models_to_rebuild_set():
    53    148.7 MiB    101.0 MiB         465           __model.model_rebuild()

pydantic`s model rebuild: https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_rebuild

MarshalX commented 2 months ago

I've created a separate isolated repository that contains pure models without additional noise and asked pydantic team about the amount of consumed memory here: https://github.com/pydantic/pydantic/issues/9982