Llama 3.1 fine-tune deployment error

GoogleCloudPlatform / vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.

Apache License 2.0

93 stars 24 forks source link

Expected Behavior

To be able to deploy the example notebook without modifications

Actual Behavior

Deploying llama3-1-vllm-serve-20240820-120605 on g2-standard-12 with 1 NVIDIA_L4 GPU(s).

---------------------------------------------------------------------------

FailedPrecondition                        Traceback (most recent call last)

<ipython-input-6-23c8838c83a1> in <cell line: 31>()
     29     raise ValueError("max_model_len cannot exceed 8192")
     30 
---> 31 models["vllm_gpu"], endpoints["vllm_gpu"] = deploy_model_vllm(
     32     model_name=common_util.get_job_name_with_datetime(prefix="llama3_1-vllm-serve"),
     33     model_id=merged_model_output_dir,

5 frames

/usr/local/lib/python3.10/dist-packages/google/api_core/future/polling.py in result(self, timeout, retry, polling)
    259             # pylint: disable=raising-bad-type
    260             # Pylint doesn't recognize that this is valid in this case.
--> 261             raise self._exception
    262 
    263         return self._result

FailedPrecondition: 400 Model server exited unexpectedly. Model server logs can be found at ......

All the links to logs in the output shows an empty log :|

Steps to Reproduce the Problem

Open notebook https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_llama3_1_finetuning.ipynb
Go through the steps
The deployment step fails

Specifications

Version:
Platform:

It looks like you're encountering a FailedPrecondition error while trying to deploy the LLaMA model using VLLM. This typically indicates that the model server is having issues starting up or executing properly. Check Model Logs: As suggested in the error message, look at the model server logs for any specific error messages that might indicate what went wrong. Model Compatibility: Ensure that the model you're trying to deploy is compatible with the environment and the GPU you're using (NVIDIA L4 in this case). Check if any specific dependencies or configurations are required for LLaMA 3. Memory and Resource Allocation: Verify that the g2-standard-12 instance has enough memory and resources allocated. Sometimes, insufficient resources can cause the server to fail during initialization.

GoogleCloudPlatform / vertex-ai-samples