Control token values during training

izaskr commented 3 years ago

Hi, I'm going through the preprocessing and training code. I'm wondering where to find the values of control tokens during training, together with the source and target sides. Just like the example given in Table 1 of the paper:

Source: <NbChars 0.3><LevSim 0.4>He settled in London , devoting himself chiefly to practical teaching .
Target: He teaches in London .

Is there a way to access these values during training? Also, what dependency parser did you use for the DepTreeDepth token and what are the frequencies for WordRank based on (which corpus, WikiLarge or something else)?

louismartin commented 3 years ago

Hi, thank you for your issue! The special tokens are written to text files here. You can find the resulting text files by browsing in resources/datasets/ after executing the training code.

At test time they are generated on-the-fly by preprocessors defined here.

For dependency parsing, we use Spacy (see this method).

izaskr commented 3 years ago

Great, thank you!

facebookresearch / access

Control token values during training #33