Closed liberty-1776 closed 1 year ago
Hi, thanks for raising the issue. I checked the data file Egc.csv
and there are no '[]' around '*'. If you mean the tokenization results, which is similar to your previous issue, I printed out the tokens and I do not see '[]' in the tokens either. Let me know if you still have any questions.
Okk, But earlier when I saw, am pretty sure that the square brackets were there. I might have made mistake in seeing the data!! Thanks
Hi there! The Egc dataset that you have provided seems to be different from other related datasets(For e.g. Egb,Ei,Xc etc.). The smiles in Egc dataset have these '[]'(square brackets) around symbol but in other datasets the symbol is not followed or preceded by these brackets. And one more thing the result you reported in the paper for Egc, is that using this same Egc dataset or one without square bracket around it? And if this is the case I was wondering why you guys have used square brackets particularly only for Egc datasets since the RoBERTa model is trained on PI1M dataset which does not contain square brackets around * symbol in smiles.