GPU integration: Dask, cuda (cudf) and RAPIDS (Polars)

adamamer20 commented 1 month ago

https://www.reddit.com/r/Python/comments/xjx4uo/benchmarking_pandas_cudf_modin_apache_arrow_and/

rht commented 2 weeks ago

This can be tested on https://en.wikipedia.org/wiki/Sugarscape,

The team of R. M. D’Souza, M. Lysenko and K Rahmani from Michigan Technological University used a Sugarscape model to demonstrate the power of Graphics processing units (GPU) in ABM simulations with over 50 updates per second with agent populations exceeding 2 million.

@tpike3 we should update that Wikipedia page to include Mesa's modern implementation.

rht commented 2 weeks ago

@adamamer20 have you seen https://github.com/projectmesa/mesa/discussions/1561 ?

adamamer20 commented 1 week ago

@rht interesting, they use a similar reasoning as mesa-frames.AgentSet.

As mentionned by @rht , one of the core difference is that there is no longer any object for individual agent. Instead, Sampy has population objects, and users create their models using those populations.

rht commented 4 days ago

@jackiekazil @tpike3 we should be able to provide @adamamer20 a cloud NVIDIA GPU access as well, when needed.

rht commented 4 days ago

From reading https://gist.github.com/devinschumacher/87dd5b87234f2d0e5dba56503bfba533, Colab has free GPU access, and several providers have free trial credits. It looks like resorting to using the GSoC funding is not needed after all. GCP should be avoided because from my experience, they are very often out of quota and it takes a while to queue for an instance. For AWS, a g4dn.xlarge instance costs ~50 cents/hr (https://aws.amazon.com/ec2/instance-types/g4/). A $100 credit should last for ~200 hrs of compute.

jackiekazil commented 3 days ago

@rht @adamamer20 I am making a request from AWS and using GSoC as a back up. I need to submit a calculator. Can you provide and estimate via this tool -- https://calculator.aws/#/estimate

adamamer20 commented 3 days ago

@jackiekazil I recently acquired a used 3090 (24GB), which should be sufficient for prototyping and running mid-size models locally. This way, we can reserve AWS resources for scaling up when the models are ready. At that point, I would use the g4ad.4xlarge (64GB) instance, which costs approximately 86.7 cents per hour, to be able to run larger models. Currently, I do not have an estimate for the time per step for such models. @rht might have more insight on this, otherwise, I will provide an update once we implement the example

jackiekazil commented 3 days ago

To be explicit - I need something in the calculator form for the formal request. Hoping @rht can put it together. I will review it.

rht commented 2 days ago

@jackiekazil I recently acquired a used 3090 (24GB), which should be sufficient for prototyping and running mid-size models locally.

Cool. Are you running self-hosted LLM's? Codestral 22 B should be pretty good. Anecdote. My own anecdote is that it was able, using https://aider.chat/ (CLI coding assistant that is able to apply its own suggested changes to your repo), to encrypt a message using post-quantum encryption in Rust, after several rounds of automated bug fixing, guided by the error messages sent out by cargo test.

@jackiekazil :

A 2007 simulation run of Sugarscape instantaneous growback took ~0.02 s per step for 2 million agents, with a resolution of 2560x1024. They used NVIDIA GeForce 8800 GTX (768 MB of VRAM).

A 2023 simulation run of Sugarscape instantaneous growback via FLAME GPU 2 took ~1 s per step for 16 million agents, ~3 ms per step for 1 million agents. They used 4 NVIDIA V100 (each 32 GB of VRAM).

If we do 100 steps, compounded with sensitivity analysis (about 500 setups to try on?), and assuming that we are ~2x slower than FLAME 2 GPU (they are very close to metal, so at worst we could be 10x slower?), then this would take 100 x 500 x 1 s x 2 / 2 = 50k seconds to run 8 million agents (assuming the step time is halved by going down from 16 millions to 8 millions) on g4ad.4xlarge (64 GB). I.e. 13.9 hours. This is a very gross estimation, because the VRAM needed probably scales quadratically to the agent size (figure 11 A in the paper), not linearly.

adamamer20 commented 1 day ago

Cool. Are you running self-hosted LLM's? Codestral 22 B should be pretty good. Anecdote. My own anecdote is that it was able, using https://aider.chat/ (CLI coding assistant that is able to apply its own suggested changes to your repo), to encrypt a message using post-quantum encryption in Rust, after several rounds of automated bug fixing, guided by the error messages sent out by cargo test.

Yes I am currently working on some LLM projects and after seeing the Colab bill I thought it would just make more sense to run them locally haha. I'll definitely give a try to Codestral and Aider! Copilot is pretty limited in his abilities, and I would like to try to use a Coding Agent (one that can also scrape docs, search on Stack Overflow ecc.).

rht commented 22 hours ago

and I would like to try to use a Coding Agent (one that can also scrape docs, search on Stack Overflow ecc.).

That would be AutoGen. In one of the examples, you can create multiple AI agents, each with a different role, that converse with each other. One will author a Python code that uses requests to do an API access to a website to download data (in their example, it's the arXiv free public API, but a Stack Overflow API should do as well). Then another agent executes the code, and finally another agent interprets the result.

Document scraping and vectorization (for RAG) can be done in Open WebUI, a local ChatGPT UI where you can swap the AI between local models or Claude/GPT-4/Gemini.

Several people on r/LocalLlama had recommended SillyTavern (see the top comment in this post). I find it quite buggy except if you are on Ubuntu. I haven't been able to have set up multiple agents to converse the content of a paper yet. Still a pending project for me.

adamamer20 / mesa-frames

GPU integration: Dask, cuda (cudf) and RAPIDS (Polars) #10