I the article, author said: "The same procedure was used to build the edits vocabulary on USPTO-full dataset and the difference is that the edits Attach LG must appear at least 50 times in the training set of USPTO-full before it will be collected into the vocabulary. This edits vocabulary include 6 bond edits, 336 atom edits (8 Change Atom and 328 Attach LG), and a termination symbol."
Apart from processing training data, will test data for leaving the group that is not in the vocabulary be deleted?
I the article, author said: "The same procedure was used to build the edits vocabulary on USPTO-full dataset and the difference is that the edits Attach LG must appear at least 50 times in the training set of USPTO-full before it will be collected into the vocabulary. This edits vocabulary include 6 bond edits, 336 atom edits (8 Change Atom and 328 Attach LG), and a termination symbol."
Apart from processing training data, will test data for leaving the group that is not in the vocabulary be deleted?