Closed ruili-pml closed 1 year ago
Also, I noticed in your code you also saved the weights of adaptors (https://github.com/VICO-UoE/URL/blob/56a57aaf5fbd9b8e287244a27daac89992e54d25/train_net_url.py#L207) in the checkpoint. I was wondering do you still happen to have the original checkpoint saved during training (the provided checkpoint only has the feature extractor's weight)? If so do you mind sharing it? Thank you.
Hi,
A following up question, when I trained the model using provided script, the validation loss is nan when it's being first evaluated (after 5000 training steps). I was wondering did I miss something?
Thanks.
Hi,
From my side the code was running normally. Probably can you check the dependencies conflicts?
Best!
Could you tell me the tensorflow and pytorch version you used? Or even better, an environment setup file. The code runs without problem so I suppose there are no dependencies conflicts but I'm not sure where the nan comes from in training loss
I was using PyTorch 1.0 and tensorflow 1.14. I don't have the setup file now but for other dependencies, you can check them here https://github.com/google-research/meta-dataset/blob/main/requirements.txt.
thanks it turned out it's bc my data preprocessing pipeline had some issue, nothing wrong with your training script.
Hi,
I was wondering how long did it take for you to train URL? I'm running the script
train_resnet18_url.sh
and it takes a really long time, would be good to know the training time so I could see if I did something wrong.Thanks, Rui