Closed mssammon closed 6 years ago
Ok I will handle it
For ACE reader: I think I have to keep this version unless 1) we merge the true-cased ACEReader into corpusreader 2) we discard true-casing in both MD and RE @danyaljj We don't want the true-cased ACEReader in corpusreader right?
We don't want the true-cased ACEReader in corpusreader
True. is it possible to make it a class that extends the ACEReader of corpusreader and overrides the method that uses the true-caser?
Also, if you change the name it would reduce the confusion. (For example ACEReaderWithTrueCaseFixer
).
An alternative method is to write a function that given TextAnnotation with messed up casing, it fixes the casing and creates a new TextAnnotation. This was you can reuse the ACEReader of corpusreaders.
I really like the idea of a truecaser function...
I will first try to extent the class. I think it should work.
These are apparently duplicates or replications of files in other modules ('external', 'corpusreaders', and 'tokenizer').
If they are duplicates, replace them. If they are modified versions, unify them.
https://github.com/CogComp/cogcomp-nlp/blob/master/md/src/main/java/edu/illinois/cs/cogcomp/pipeline/handlers/StanfordTrueCaseHandler.java
https://github.com/CogComp/cogcomp-nlp/blob/master/md/src/main/java/edu/illinois/cs/cogcomp/nlp/corpusreaders/ACEReader.java
https://github.com/CogComp/cogcomp-nlp/blob/master/md/src/main/java/edu/illinois/cs/cogcomp/nlp/tokenizer/TokenizerStateMachine.java