Open kmehant opened 2 weeks ago
When in the case of pretokenized datasets, we should provide the feature on computing attention mask on the go when not part of the provided datasets. This can opinionated and up for discussion.
When in the case of pretokenized datasets, we should provide the feature on computing attention mask on the go when not part of the provided datasets. This can opinionated and up for discussion.