Open thangk opened 6 days ago
Hi @thangk thanks for the progress log.
Opennmt only gives you the translation metrics like ppl, as seen in the image.
@jamil2388 please advise
@hosseinfani, @thangk for now, I am putting a doc link here. This contains almost all sets of arguments used for onmt pipeline.
https://community.libretranslate.com/t/documentation-for-opennmt-py-parameters/927/
I think looking into this argument in the page might help us for prediction file dumping : –dump_preds
Also I advice Kap to learn about the behavior of the translation metrics used in the current runs. Because it will help crucially in understanding the model train and test behavior, eventually letting us know the direction of adjustments.
Thanks!
@jamil2388 thanks.
@thangk one more thing. when exploring hyperparameters, also see how you can use openmt for different type of translators. Because, we need to study the effect of translation for our work. These translators should be published in a paper such that we can cite them in the paper. I think openmt community update their codeline to include more and more new translators, which helps you for our task (this is like @jamil2388 using different gnn methods from pyg for team formation).
Hi @hosseinfani, I'll continue my question here if that's okay.
continuing conversation from whether or not to average all the folds' eval metrics to get one set of data for each epoch setting (ie. 500, 1000)
I was referring to these. Each fold produces its own eval metrics. There's one more, fold2, below fold1, which isn't visible in the screenshot. I am thinking the right approach is to average the e500 and e1000 pairs across all 3 folds to put in the excel.
I saw some charts we've used in some papers, and I can see those papers use the average of the folds. I'll follow the same approach.
Hi @thangk thabks for bringing the conversation here :)
now I see. There should be another file with no fold-idx, like test.epoch* that include the average of folds.
but you're right about average of folds
There should be another file with no fold-idx, like test.epoch* that include the average of folds.
Yes, I see one outside the fold folders.
@thangk my preference is to keep the progress logs like this issue, rather than chats in teams or else where.
Tested dataset
data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3
Input type
Sparse matrix
Command used
python -u main.py -data ../data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3 -domain dblp -model nmt
Observations
The script ran through all 3 folds and produced results without errors, no predictions.
Next step(s)