IBM / kgi-slot-filling

This is the code for our KILT leaderboard submissions (KGI + Re2G models).
Apache License 2.0
141 stars 14 forks source link

Could not generate training data #2

Open kristopher283 opened 2 years ago

kristopher283 commented 2 years ago

Hi @gaetangate I'm at the final step to generate training files .jsonl When I run this command: export CLASSPATH=jar/dprBM25.jar:./anserini/target/anserini-0.4.1-SNAPSHOT-fatjar.jar java com.ibm.research.ai.pretraining.retrieval.DPRTrainingData -passageIndex anserini_passage_index -positivePidData zsRE_train_positive_pids.jsonl -trainingData zsRE_dpr_training_data.jsonl

I face this error: Error: Could not find or load main class com.ibm.research.ai.pretraining.retrieval.DPRTrainingData Caused by: java.lang.ClassNotFoundException: com.ibm.research.ai.pretraining.retrieval.DPRTrainingData Do you have any idea about `com.ibm.research.ai.pretraining.retrieval.DPRTrainingData?

michaelrglass commented 2 years ago

Can you confirm that jar/dprBM25.jar is the path to the jar from this repo? Maybe try with absolute path to be sure.

kristopher283 commented 2 years ago

Oh nice, thank you very much, @michaelrglass . Now I can run the command. Btw, can I confirm that is this log correct behaviour?

skipping 17058091::[0,7] in positives skipping 307157::[4,5] in positives skipping since we found an answer: ' least concern ' skipping 12426884::[15,17] in positives skipping 20792463::[11,13] in positives skipping since we found an answer: ' least concern ' skipping 1475129::[0,5] in positives skipping 12533644::[0,3] in positives skipping 6956383::[2,4] in positives skipping since we found an answer: ' vulnerable ' skipping since we found an answer: ' critically endangered ' skipping 12621756::[0,1] in positives skipping since we found an answer: ' vulnerable ' skipping 38129955::[5,7] in positives skipping 12505161::[0,10] in positives (Last warning) skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' vulnerable ' skipping since we found an answer: ' critically endangered ' (Last warning) On instance 2560 On instance 3840 On instance 5120 On instance 6400 On instance 9728 On instance 16640 On instance 19712 On instance 20992 On instance 22272 On instance 22784 On instance 25856 On instance 29440 On instance 31744 On instance 34048 On instance 36096 On instance 38400 On instance 41472 On instance 44288 On instance 45824 On instance 46592 On instance 49664 On instance 52736 On instance 55552 On instance 56832 On instance 59904 On instance 61440 On instance 62976 On instance 65024 On instance 67072 On instance 68096 On instance 69120

On instance 69632 On instance 70912 On instance 72192 On instance 72960 On instance 74240 On instance 77312 On instance 82688 On instance 84480 On instance 86016 On instance 91648 On instance 93184 Skipped 453 instances for lack of hard negatives

michaelrglass commented 2 years ago

Yes, looks good.

Best, Michael


From: kristopher283 @.> Sent: Wednesday, May 18, 2022 11:30 PM To: IBM/kgi-slot-filling @.> Cc: Michael R Glass @.>; Mention @.> Subject: [EXTERNAL] Re: [IBM/kgi-slot-filling] Could not generate training data (Issue #2)

Oh nice, thank you very much, @michaelrglass . Now I can run the command. Btw, can I confirm that is this log correct behaviour? skipping 17058091::[0,7] in positives skipping 307157::[4,5] in positives skipping since we found an answer: ' ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Oh nice, thank you very much, @michaelrglasshttps://github.com/michaelrglass . Now I can run the command. Btw, can I confirm that is this log correct behaviour?

skipping 17058091::[0,7] in positives skipping 307157::[4,5] in positives skipping since we found an answer: ' least concern ' skipping 12426884::[15,17] in positives skipping 20792463::[11,13] in positives skipping since we found an answer: ' least concern ' skipping 1475129::[0,5] in positives skipping 12533644::[0,3] in positives skipping 6956383::[2,4] in positives skipping since we found an answer: ' vulnerable ' skipping since we found an answer: ' critically endangered ' skipping 12621756::[0,1] in positives skipping since we found an answer: ' vulnerable ' skipping 38129955::[5,7] in positives skipping 12505161::[0,10] in positives (Last warning) skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' critically endangered ' skipping since we found an answer: ' vulnerable ' skipping since we found an answer: ' critically endangered ' (Last warning) On instance 2560 On instance 3840 On instance 5120 On instance 6400 On instance 9728 On instance 16640 On instance 19712 On instance 20992 On instance 22272 On instance 22784 On instance 25856 On instance 29440 On instance 31744 On instance 34048 On instance 36096 On instance 38400 On instance 41472 On instance 44288 On instance 45824 On instance 46592 On instance 49664 On instance 52736 On instance 55552 On instance 56832 On instance 59904 On instance 61440 On instance 62976 On instance 65024 On instance 67072 On instance 68096 On instance 69120

On instance 69632 On instance 70912 On instance 72192 On instance 72960 On instance 74240 On instance 77312 On instance 82688 On instance 84480 On instance 86016 On instance 91648 On instance 93184 Skipped 453 instances for lack of hard negatives

— Reply to this email directly, view it on GitHubhttps://github.com/IBM/kgi-slot-filling/issues/2#issuecomment-1131152695, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AILL4TLFUKTTBUIKHDZ3233VKWYUJANCNFSM5WG2R6BA. You are receiving this because you were mentioned.Message ID: @.***>