fani-lab / OpeNTF

Neural machine learning methods for Team Formation problem.
Other
18 stars 12 forks source link

OpeNTF via NMT (OpenNMT) #243

Open thangk opened 6 days ago

thangk commented 6 days ago

Tested dataset

data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3

Input type

Sparse matrix

Command used

python -u main.py -data ../data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3 -domain dblp -model nmt

Observations

The script ran through all 3 folds and produced results without errors, no predictions.

image

Next step(s)

hosseinfani commented 6 days ago

Hi @thangk thanks for the progress log.

Opennmt only gives you the translation metrics like ppl, as seen in the image.

@jamil2388 please advise

jamil2388 commented 6 days ago

@hosseinfani, @thangk for now, I am putting a doc link here. This contains almost all sets of arguments used for onmt pipeline.

https://community.libretranslate.com/t/documentation-for-opennmt-py-parameters/927/

I think looking into this argument in the page might help us for prediction file dumping : –dump_preds

Also I advice Kap to learn about the behavior of the translation metrics used in the current runs. Because it will help crucially in understanding the model train and test behavior, eventually letting us know the direction of adjustments.

Thanks!

hosseinfani commented 6 days ago

@jamil2388 thanks.

@thangk one more thing. when exploring hyperparameters, also see how you can use openmt for different type of translators. Because, we need to study the effect of translation for our work. These translators should be published in a paper such that we can cite them in the paper. I think openmt community update their codeline to include more and more new translators, which helps you for our task (this is like @jamil2388 using different gnn methods from pyg for team formation).

thangk commented 2 days ago

Hi @hosseinfani, I'll continue my question here if that's okay.

continuing conversation from whether or not to average all the folds' eval metrics to get one set of data for each epoch setting (ie. 500, 1000)

I was referring to these. Each fold produces its own eval metrics. There's one more, fold2, below fold1, which isn't visible in the screenshot. I am thinking the right approach is to average the e500 and e1000 pairs across all 3 folds to put in the excel.

image

thangk commented 2 days ago

I saw some charts we've used in some papers, and I can see those papers use the average of the folds. I'll follow the same approach.

hosseinfani commented 2 days ago

Hi @thangk thabks for bringing the conversation here :)

now I see. There should be another file with no fold-idx, like test.epoch* that include the average of folds.

but you're right about average of folds

thangk commented 2 days ago

There should be another file with no fold-idx, like test.epoch* that include the average of folds.

Yes, I see one outside the fold folders.

image

hosseinfani commented 2 days ago

@thangk my preference is to keep the progress logs like this issue, rather than chats in teams or else where.