adymaharana / curriculum_learning

9 stars 3 forks source link

Problems replicating #2

Open LucWeber opened 1 year ago

LucWeber commented 1 year ago

Hey,

I am trying to reproduce your results on the CODAH-dataset and there are some open ends in the code.

The following files are missing / never get generated:

  1. 'train_ranked_by_qap.tsv',. There is the train_with_scores.csv, but those examples are not ordered, which is necessary for the curriculum to work, if I am not mistaken?
  2. checkpoint-best_logits.txt and checkpoint-best_train_logits.txt. Do we manually have to select and rename them?

Other things that cause errors:

  1. Building the dataset the code expects 'segment-ids' to be in the features, but they are not (and they are also not necessary for roberta)

Other clarifications:

  1. The convert_examples_to_features function has a curriculum argument which adds qap-scores to the features, but the argument is not used in any of the scripts
  2. checkpoint-100_xx; checkpoint-200_xx aso.. correspond to to different stages of the first epoch and checkpoint-epoch-1_xx aso to everything that comes afterwards?
  3. For the curriculum learning condition, only the final performances are saved in codah_qap_cl.jsonl for every hyperparameter-set from the bayesian optimisation process? The corresponding baseline performance would be in ./baselines/codah-roberta-large/fold_4/is_test_false_eval_results.txt.?

Thanks a lot for your help!