Open Elfinwang opened 2 weeks ago
Hi,
Thanks for the feedback!
The '${train_file}_for_simcse.csv' file is obtained by running 'src/prepare_CL_dataset.py'. Sorry my training data is over 2GB and I could not upload it to the repo. You may run'prepare_CL_dataset.py' using some query triplets to get the files.
For the error you incurred, the reason is that we used a tree based query encoding which is different from plain text in SimCSE. Also in case you have trouble running the above-mentioned code, you may also contact me via email G220002@e.ntu.edu.sg and I can share you a small set of training data to see if this error still happens.
Hope this reply clarifies your doubts!
I’m having trouble getting the run_CLTrain.sh script to execute.
examples[sent0_cname][idx] = conv_dict(ast.literal_eval(examples[sent0_cname][idx].replace('−inf', '−2e308')))
The error occurs when trying to parse the string with ast.literal_eval.I would appreciate your help!!!