Closed helena-1115 closed 1 year ago
Thanks for raising this issue. I tried to control for randomness by calling pl.seed_everything in the training script. Unfortunately, I think that GPU training is inherently nondeterministic for reasons that I don't understand. Are you getting results within the same ballpark (within an F1 point or two) across runs, or are you getting wildly different results?
In the case of the covidfact dataset, I'm getting results within the same ballpark (within an F1 point or two), so that seems fine. For the scifact_20 dataset, the results are quite different. So I'm thinking of averaging multiple results. Thanks for the quick response:)
Hmm interesting, can you give me a sense of how different they are for scifact_20? I can try a few training runs and see if I get the same level of variation.
Yes:) For the scifact_20 dataset, the results are different as shown in the two figures below!
Hmm so the results seem to vary by a few F1 points. Interesting. I'm traveling for the next couple weeks, but when I get back I'll try to take a look.
Unfortunately I think I'm not going to have the bandwidth to run more experiments on this. I agree it's strange that there's so much variance between runs. If you have results on this, feel free to submit a Markdown document to the doc
folder describing your findings and I can merge it in; that way we'll at least have this issue recorded.
Thank you for taking a look! Perhaps trying multiple experiments and averaging the results could be a potential solution.
Hi, David:) Thanks a lot for sharing your great work!
I am currently struggling on reproducing the results. I tried finetuning on covidfact and scifact_20 dataset with one GPU(RTX 3090), but the results are different each time.
Looking at the metrics.csv file(in checkpoints folder) generated during finetuning, the label_loss, rationale_loss, and loss are recorded differently each time, so it seems that the randomness is not controlled during the training process.
When I looked at the code, I think there is no problem with the dataloader part because it is fixed. I tried to add the code (in anaconda3/envs/multivers/lib/python3.8/site-packages/pytorch_lightning/utilities/seed.py) for setting the seed as below, but it is still not reproducible.
I wonder if there are any parts that need additional modification or if there are parts difficult to reproduce perfectly. Thank you~~