It is possible to sample a token first and check whether that token can be accepted by the formatter. This means we only need to compute the mask for the whole vocabulary if the token is not accepted. This nonetheless does not integrate well with existing pipelines. We probably need to get some PRs passed first.
It is possible to sample a token first and check whether that token can be accepted by the formatter. This means we only need to compute the mask for the whole vocabulary if the token is not accepted. This nonetheless does not integrate well with existing pipelines. We probably need to get some PRs passed first.