GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview
Apache License 2.0
7.24k stars 1.95k forks source link

[Bug]: 500 internal server error on fine tuning jobs #873

Closed sidoncloud closed 1 week ago

sidoncloud commented 3 months ago

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/tuning/gemini_supervised_tuning_qa.ipynb

What happened?

The training pipeline does not work. I wonder what exactly do the google guys do before publishing notebooks out to public. None of their fine-tuning jobs work. Just admit Gemini ( pro, non pro, flash, non flash etc etc ) is not as established & mature as google claims it to be and work on improving the underlying model rather than trying to compete with OpenAI.

Below are the lines of code which fail.

sft_tuning_job = sft.train(
    source_model=foundation_model,
    train_dataset=TUNING_DATA_URI,
    # Optional:
    validation_dataset=VALIDATION_DATA_URI,
    epochs=3,
    learning_rate_multiplier=1.0,
)

# Get the tuning job info.
sft_tuning_job.to_dict()

Below is the error :

File ~/.local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py:131, in _GapicCallable.__call__(self, timeout, retry, compression, *args, **kwargs)
    128 if self._compression is not None:
    129     kwargs["compression"] = compression
--> 131 return wrapped_func(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py:78, in _wrap_unary_errors.<locals>.error_remapped_callable(*args, **kwargs)
     76     return callable_(*args, **kwargs)
     77 except grpc.RpcError as exc:
---> 78     raise exceptions.from_grpc_error(exc) from exc

InternalServerError: 500 Internal error encountered.

Relevant log output

File ~/.local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py:131, in _GapicCallable.__call__(self, timeout, retry, compression, *args, **kwargs)
    128 if self._compression is not None:
    129     kwargs["compression"] = compression
--> 131 return wrapped_func(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py:78, in _wrap_unary_errors.<locals>.error_remapped_callable(*args, **kwargs)
     76     return callable_(*args, **kwargs)
     77 except grpc.RpcError as exc:
---> 78     raise exceptions.from_grpc_error(exc) from exc

InternalServerError: 500 Internal error encountered.

Code of Conduct

vijaycsc27 commented 2 months ago

any update on it. I am getting the same problem.

vijaycsc27 commented 2 months ago

@sidoncloud

I am able to run it with following code.

Tune a model using train method.

sft_tuning_job = sft.train( source_model="gemini-1.0-pro-002", train_dataset=f"{BUCKET_URI}/sft_train_samples.jsonl",

Optional:

validation_dataset=f"{BUCKET_URI}/sft_val_samples.jsonl",
epochs=3,
learning_rate_multiplier=1,

)

Get the tuning job info.

sft_tuning_job.to_dict()

gericdong commented 1 week ago

There have been some changes with the Gemini tuning recently. Please try again and re-open the issue if you still see the same issue.