Closed longmai-pinbus closed 1 year ago
Hi @longmai-pinbus, the script is for reproducing Table 2 in the paper. Yes, it takes some time and memory.
To evaluate cross-lingual plagiarism identification between English and Vietnamese I would recommend to create your own evaluation script.
Maybe along the lines of:
from nmtscore import NMTScorer
scorer = NMTScorer()
num_correct = 0
for sentence_en, sentence_vi, gold_label in my_dataset:
similarity_score = scorer.score(sentence_en, sentence_vi)
num_correct += (similarity_score > my_threshold) == gold_label
accuracy = num_correct / len(my_dataset)
Closing this, feel free to reopen
Hi @jvamvas
I'm doing a research on cross-lingual plagiarism identification and found your tool is so good. I tried to run your test script (with steps that you pointed out in README.md).
When run the script with "prism", I ran into this error
Then, I modified the model from prism to small100 and ran the script again, but it took me more than 12 hours waiting and nothing happen.
Could you tell me the exactly way to re-run the script and which format of dataset should I choose if I want to run test on cross-lingual plagiarism identification between English and Vietnamese?
Thanks,