Problem about downstream task data.

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

CogComp / TacoLM

Problem about downstream task data. #2