When a paper contain a verbatim of some special tokens (e.g., [SEP] or [BLK]), the current code cannot appropriately handle them, after #29. One interesting example is that, as reported in #31, when parsing our own VILA paper, it will fail on page 2, where there are multiple occurrences of the [BLK] text in the paper. This PR proposes a simple fix -- by simply remove the square brackets [ and ] from the text.
When a paper contain a verbatim of some special tokens (e.g.,
[SEP]
or[BLK]
), the current code cannot appropriately handle them, after #29. One interesting example is that, as reported in #31, when parsing our own VILA paper, it will fail on page 2, where there are multiple occurrences of the[BLK]
text in the paper. This PR proposes a simple fix -- by simply remove the square brackets[
and]
from the text.