I'm trying to create preprocessed training files using my custom data. My data doesn't include any hard negatives, and when I use your script create_training_files.py, errors show up saying no triplets are constructed:
2021-08-20 14:30:58,836,836 INFO [create_training_files.py:453] loading metadata: ../../data/specter/metadata.json
2021-08-20 14:30:58,907,907 INFO [create_training_files.py:457] loading data file: ../../data/specter/data.json
2021-08-20 14:30:59,040,40 INFO [create_training_files.py:466] getting instances for `data` and `train` set
2021-08-20 14:30:59,041,41 INFO [create_training_files.py:468] writing output ../../data/specter/preprocessed/data-train.p
2021-08-20 14:30:59,101,101 INFO [create_training_files.py:303] Generating triplets ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 85452/85452 [01:00<00:00, 1404.12it/s]
INFO:/home/guoao/anaconda3/envs/specter/lib/python3.7/site-packages/specter-0.0.1-py3.7.egg/specter/data_utils/triplet_sampling.py:Done generating triplets, #successful queries: 0,#skipped queries: 85452
2021-08-20 14:32:01,745,745 INFO [create_training_files.py:365] done getting triplets, success rate:0.00%,total: 0
2021-08-20 14:32:01,746,746 INFO [create_training_files.py:407] converting raw instances to allennlp instances:
0it [00:00, ?it/s]
Then I dive into the script specter/data_utils/triplet_sampling.py to use TripletGenerator and see what happens (since I can't use breakpoints in multiprocess programs). I find out that since there're no hard negatives, the marginhere becomes 0.0, making the candidates_pos a blank list.
If I change the line to if candidates[j][1] >= margin + candidates[-1][1]:, the function will work. I don't really understand the meaning of margin and not sure if changing the line will impact the generation results or not. So I wonder if it's safe to do so?
Hi,
I'm trying to create preprocessed training files using my custom data. My data doesn't include any hard negatives, and when I use your script
create_training_files.py
, errors show up saying no triplets are constructed:Then I dive into the script
specter/data_utils/triplet_sampling.py
to useTripletGenerator
and see what happens (since I can't use breakpoints in multiprocess programs). I find out that since there're no hard negatives, themargin
here becomes0.0
, making thecandidates_pos
a blank list.If I change the line to
if candidates[j][1] >= margin + candidates[-1][1]:
, the function will work. I don't really understand the meaning ofmargin
and not sure if changing the line will impact the generation results or not. So I wonder if it's safe to do so?Thank!