allenai / bff

Apache License 2.0
37 stars 8 forks source link

Adds the option to run on a whole document at a time #4

Closed dirkgr closed 1 year ago

dirkgr commented 1 year ago

Other changes:

dirkgr commented 1 year ago

To be sure, I don't know what the point of this is. It would be insane to run deduplication this way. It would be borderline insane to even count ngrams this way (i.e., with --annotate-only). And it's not implementing the full GPT2 spec from #3 either, which is already insane all by itself.

So I'm just writing the code and I trust that you know what you're doing with it.