delph-in / repp

Regular Expression Preprocessor
https://github.com/delph-in/docs/wiki/ReppTop
GNU Lesser General Public License v2.1
1 stars 0 forks source link

REPP versions #3

Open arademaker opened 2 years ago

arademaker commented 2 years ago

What is the relation between this code and the code from Woodley?

https://github.com/delph-in/homebrew-delphin/blob/HEAD/Formula/repp.rb#L4

I know from https://github.com/delph-in/docs/wiki/ReppTop that this code from Woodley and the @goodmami implementation at https://pydelphin.readthedocs.io/en/latest/api/delphin.repp.html are alternative implementations. Are all of these 100% compatible?

goodmami commented 2 years ago

Woodley's version is what's used in ACE, and I believe it predates this implementation a little. This repo is the code used for PET and for the standalone repp command (which is currently used in the NLTK's nltk.tokenize.repp module). Two other implementations include PyDelphin and the LKB's (which probably should be listed in the ReppTop wiki's "Implementations" section, even though it's mentioned elsewhere in the doc).

They are mostly compatible. The main differences are masking support and characterization (start/stop indices of tokens). This repo and Woodley's repp-0.2.2 release do not include masking, but Woodley has an unreleased version of his implementation with masking support that is used in recent versions of ACE. The LKB and PyDelphin both have masking support. And where PyDelphin follows this repo's characterization behavior exactly, Woodley's code, last I checked, outputs different characterization in some cases. I don't recall what the LKB does for characterization.