Open seasoncool opened 1 month ago
you should run the llama3_distributed.py (examples path) for running your prompt via on those peer!
I appreciate your response. I configured the path to my local large model within the 'example' directory, but when I started the log, it was still pointing towards the 'llama3' model.
exo currently supports the mlx and tinygrad format to inference via multiple peer gpus. Your chosen model seems like not the format of mlx or tinygrad. Should try the mlx and tinygrad format for inferences!
I downloaded this LLM from Hugging Face, and it should support mlx format.
I've also set up the model path based on the error messages, but I'm still not succeeding.
output:
Here is my llama3_distributed.py code
logging.basicConfig(level=logging.DEBUG)
models = { "sosoai/hansoldeco-llama3-8b-instruct-v0.1-mlx": Shard(model_id="sosoai/hansoldeco-llama3-8b-instruct-v0.1-mlx", start_layer=0, end_layer=0, n_layers=32), "mlx-community/Meta-Llama-3-70B-Instruct-4bit": Shard(model_id="mlx-community/Meta-Llama-3-70B-Instruct-4bit", start_layer=0, end_layer=0, n_layers=80) }
path_or_hf_repo = "sosoai/hansoldeco-llama3-8b-instruct-v0.1-mlx" model_path = get_model_path(path_or_hf_repo) tokenizer_config = {} tokenizer = load_tokenizer(model_path, tokenizer_config)
I've also used my own mlx model for that but, there was no problem for inferences with other peers.
Can you check the llama3_distributed.py file path? I've moved this file onto main exo project path and run python3 llama3_distributed.py.
Hope this will help!
My environment seems to be installed successfully and I can open the chat website, but I'm unsure where to set the path for the large model?"