Open SeungsuBaek opened 1 year ago
Certainly. Here's a summarized paragraph:
Setting up Alpa-Serve for parallel processing on your server involves several key steps. Begin by configuring Alpa-Serve using provided or customized configuration files. Launch Alpa-Serve as a server process, considering the need for multiple instances to handle parallel tasks efficiently. Integrate Alpa-Serve with Ray, a prerequisite package, to enable distributed task execution. Develop a job distribution strategy, scripting the submission of tasks to Alpa-Serve with specified parallelism levels. Implement monitoring, scaling, error handling, and fault tolerance mechanisms, ensuring the system's reliability. Rigorous testing and benchmarking are essential before deployment, and comprehensive documentation will facilitate maintenance and troubleshooting. Seek support from the Alpa-Serve community if specific issues arise during setup, tailoring your configuration to meet your unique server requirements.
Thanks for your answer.
Now, i can get a model placement using alpa-serve simulator. and run it.
But, i have more question about source code.
What is meaning of mem_budget? is it cluster(groups)'s maximum memory? or single group maximum memory?
If i use bert-6.7B, the weight_mem of bert-6.7B shows 6.7GB. I think that the weight_mem of bert-6.7B is about 13GB like in your paper. Can you explain this calculation of memory constraints?
Thanks to read my questions.
Could you please provide a detailed explanation of how to run AIpaserve? Thanks a lot!
Hi.
I am interested in your nice work.
I want to get a parallel configuration for my server.
I read your codes but it is hard to find some documents or steps for Alpa-serve (not Alpa).
Can you give some advice to run alpa-serve system on a server? (To get a parallel configuration, how to use alpa-serve?)
I already installed pre-requisite package (ray and other python packages).