Closed oxinabox closed 5 years ago
I can help out with this after next week. However, should I wait until your 'fresh' branch is merged?
Yes, I'll try and merge it before next week then. It's vaguely waiting for me to port more stuff from the old version, though I can look up the old version from it's tag.
@ksteimel that took longer than expected but the fresh branch is now merged
I would like to work on this issue.
Feel free. I will review any PRs.
I am receiving the following error while parsing one of the file in senseval2 corpus.
Error parsing "<wf cmd=done id=d00.s09.t01 pos=NNS lemma=other wnsn=0 lexsn=U>others</wf>". ErrorException("type Void has no field captures")
This error has been traced back to a function similar to the one here -https://github.com/JuliaText/CorpusLoaders.jl/blob/58c824dbff95cbb3c3107377750a54d909944932/src/SemCor.jl#L33
The error is caused by lexsn
being matched as empty.
What might be the best way around this? One way could be by adding exception handling and change the match expression for it. Is there any better way to do this?
I would not add exception handling. Julia code prefers to be written to avoid exceptions rather than handle them,. (Unlike python julia exception handling is pretty slow).
I think the regex can probably be relaxed some
this bit:
lexsn=(\d.*:\d*)
So that U
is also acceptable.
Some of the logic after that also will wnat adjusting.
But in the SenseAnnotatedWord
the type of the lexen
field is still String
Rada Mihalcea provides then SenseEval corpora 2 and 3, in SemCor format http://web.eecs.umich.edu/~mihalcea/downloads.html#sensevalsemcor
Thus because we have a SemCor parser already we basically already support them. It is more a matter of writing the data deps registration, than any real parsing.
This would be a good and easy PR to make