I realize that one is supposed to remove punctuation marks from the input text before using the model. But what do we do with things like "24/7", "R&D", "9-11" etc. in the input text? There are potentially a lot of such things and it is hard to catch all of them in the preprocessing. Is it possible to get OOV tokens in the output verbatim as they appear in the input instead of <ukn>?
I realize that one is supposed to remove punctuation marks from the input text before using the model. But what do we do with things like "24/7", "R&D", "9-11" etc. in the input text? There are potentially a lot of such things and it is hard to catch all of them in the preprocessing. Is it possible to get OOV tokens in the output verbatim as they appear in the input instead of
<ukn>
?