Jamson-Zhong / Graph2Edits

MIT License
29 stars 4 forks source link

Questions about USPTO-full preparation #7

Open jiachengxiong opened 8 months ago

jiachengxiong commented 8 months ago

I the article, author said: "The same procedure was used to build the edits vocabulary on USPTO-full dataset and the difference is that the edits Attach LG must appear at least 50 times in the training set of USPTO-full before it will be collected into the vocabulary. This edits vocabulary include 6 bond edits, 336 atom edits (8 Change Atom and 328 Attach LG), and a termination symbol."

Apart from processing training data, will test data for leaving the group that is not in the vocabulary be deleted?

Jamson-Zhong commented 8 months ago

In the USPTO-full dataset, the test data which the leaving group is not in the vocabulary were retained.