Blaizzy / fastmlx

FastMLX is a high performance production ready API to host MLX models.
Other
159 stars 12 forks source link

Memory leak ? #26

Open iLoveBug opened 1 month ago

iLoveBug commented 1 month ago

Thanks guys for your great share of this repo.

I try to use llama 3.1 with tools for graphrag on my MacBook Pro M3 Max 128GB,though Ollama support this model, but I found the entity extraction result is very strange.

Fortunately, fastmlx works quite fine with llama 3.1 for graphrag 0.2.1 (I use this version) except the memory consumption growth as time goes by.

I am not sure it's a memory leak or not.

I download a novel from website, and feed it into graphrag, the file size is less than 200KB.

Hope to get some support here.

Blaizzy commented 1 month ago

Hey @iLoveBug

I'm happy to hear that fastmlx works well for graphrag.

Could you give me a reproducible example?

For instance, how you start the server and the requests.

iLoveBug commented 4 weeks ago

Hello,

Thanks for your reply.

Here attached is my configuration and data file for graphrag. I use a Chinese version of the famous fiction: 80 days around the world for test. You can replace with English one or any others.

I use Llama 3.1 70b as the LLM model, I also test with 8b version, and I got the same issue.

I use two virtual environments both with python 3.11.9:

One for graphrag, with version 0.2.1 Another for mlx, with version 0.16.1

I don’t remember the exact reason why I have to use two separated environments, mostly the conflict of some certain packages.

To reproduce the issue, just:

1/ ollama is used for local embedding, I use nomic-embed-text model 2/ in mlx environment, run fastmlx 3/ in graphrag environment, run python -m graphrag.index —root ./80days-ollama-llam3.1

Regards,

2024年8月13日 21:36,Prince Canuma @.***> 写道:

Hey @iLoveBug https://github.com/iLoveBug I'm happy to hear that fastmlx works well for graphrag.

Could you give me a reproducible example?

For instance, how you start the server and the requests.

— Reply to this email directly, view it on GitHub https://github.com/Blaizzy/fastmlx/issues/26#issuecomment-2286273293, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJYUF75GWHTKHEHGNLODA3TZRIDU5AVCNFSM6AAAAABMKZIVU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGI3TGMRZGM. You are receiving this because you were mentioned.

Blaizzy commented 3 weeks ago

Can you share the exact example on how to replicate this issue.

Please include as much detail as possible:)

Blaizzy commented 3 weeks ago

The request and response you get

iLoveBug commented 3 weeks ago

80days-ollama-llama3.1.zip you can try with this package.

Blaizzy commented 3 weeks ago

Thanks for the example @iLoveBug!

But I'm afraid I don't understand what the error is.

Can you elaborate on what you mean by "except the memory consumption growth as time goes by."?

iLoveBug commented 2 weeks ago

Sorry for the confuse.

My problem is graphrag consumes a lot of token via many request to LLM, for example, it first request to extract entities and relationships from text chunks, and then write some community report based on the extracted entities and relationships. During the process, I saw the memory consumption growth, and got slower and slower. This is why I guess there maybe some memory leak. The normal case should be the system will release the memory after it finish the response of the request, am I right?