Closed kkrishnan90 closed 1 year ago
[Fixed] Debugged a bit and found that the pytorch version had some conflict on my Vertex AI managed notebook (pre-built containers with pytorch and transformers already installed). Created a custom container and also tried the code in user managed notebook and is working fine. Closing the issue.
Hi Shiva, Thank you very much for a such clean and neat wrapper for training ML models. I am using
t5
(preciselyt5-small
) as the base to train my model for summarization. I use the dataset usingdatasets
from huggingface. However, everytime when I initiate the training code, the kernel dies and restarts. Any help here is much appreciated!Following is my code.
Import dependencies
Load data using
datasets
fromhuggingface
Preparing the train and eval data
Loading
simpleT5
andwandb_logger
and finally loading the model and training codeI am running this code on the following machine. A vertex AI workbench from Google Cloud.
N1-Standard-16
machine type with 16 core and 60 GB Memory. And added GPUP100
. Any help is much appreciated ! Thanks in advance!