CogComp / TacoLM

Temporal Common Sense Acquisition with Minimal Supervision, ACL'20
20 stars 4 forks source link

Problem about downstream task data. #2

Open PosoSAgapo opened 3 years ago

PosoSAgapo commented 3 years ago

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

Slash0BZ commented 3 years ago

Thanks, we have realized this and will work on something new. Meanwhile, if you want to work with other transformers, you will need to modify pattern_extraction.py, which contains the entire process of parsing raw textual data into the Bert format.