Closed ZordoC closed 3 years ago
Thanks for reporting that issue.
That was a problem when updating pytorch lightning version. In the older version on_fit_end()
callback function only received 2 positional arguments, I thought I had solved that before updating lightning dependencies... I'll fix that today!
I released a version 0.0.6.post1 that solves that... tell me if it works!
Cumprimentos
Hey!
This time the model trained successfully according to the logs!
Epoch 2: 100%|██████████| 25000/25000 [1:16:41<00:00, 5.43it/s, loss=0.056, v_num=4-35, pearson=0.924, kendall=0.81, spearman=0.946, avg_loss=0.0621]
Training Report Experiment:
train_loss_step train_loss ... train_avg_loss train_loss_epoch
Epoch 0 0.183138 0.183138 ... 0.099132 NaN
Epoch 1 0.006920 0.006920 ... 0.101763 0.107044
Epoch 2 0.001943 0.001943 ... 0.065580 0.067810
[3 rows x 12 columns]
All looks good, but when inspecting the experiments folder :
Seems like something is missing (the metadata data from the csv)
Whenever I try to load the model:
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from comet.models import load_checkpoint
>>> model = load_checkpoint("events.out.tfevents.1606298119.ip-172-31-41-58.27572.0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/comet/lib/python3.6/site-packages/comet/models/__init__.py", line 135, in load_checkpoint
checkpoint, hparams=hparams
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/core/saving.py", line 132, in load_from_checkpoint
checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
return torch.load(f, map_location=map_location)
File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 692, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x18'.
I guess that's the correct way of loading the model right? Could you provide an example if not?
Best
Jose
Actually the events.out.tfevents.1606298119.ip-172-31-41-58.27572.0
is a tensorboard file! not the checkpoint file. The checkpoint file should end with .ckpt
. From your ls
, it looks like lightning has not saved any checkpoint...
I released another post-release version 0.0.6.post2 that should have that fixed.
The problem was the new lightning version that deprecated the file_path
parameter from the ModelCheckpoint and changed the behaviour of the period
parameter. These two updates made the ModelCheckpoint callback useless.
Obrigado mais uma vez! Todos os bugs são bem vindos, especialmente agora no inicio 😃
No problems! I'll close the issue.
If you have anything that I can help with I'm interested! Maybe write some examples/docs on how to train a model? Would you be up to that? I've been interested in contributing to a OSS for a while :-)
Obrigado!
Yep, that would be awesome! If for example, you write a tutorial on how to train a system we can add that to the documentation!
Okay I will do that :-) !
Best
🐛 Bug
Hello! I've tried to train my a comet model using my own data! I want to train using hter as a metric, I used your configuration that's present in the repo: https://github.com/Unbabel/COMET/blob/master/configs/xlmr/base/hter-estimator.yaml
To Reproduce
Python 3.6.9
Where config.yml is the configuration I mentioned above with alterations to the training data path. It does not seem to be an issue with the data as I have the correct column names and the model did train through the 2 epochs that were established in the configuration file.
Expected behaviour
Trained model, that could be loaded via python.
Screenshots
Here's the output from my logs.
Environment
OS: Linux Packaging: pip Version: latest
Thank you for your time!
Cumprimentos,
Jose :-)