NLP-CISUC / NLPyPort

MIT License
23 stars 15 forks source link

expected cilitics behavior #3

Open tattoodobem opened 5 years ago

tattoodobem commented 5 years ago

Hello,

In this document: http://drops.dagstuhl.de/opus/volltexte/2019/10885/pdf/OASIcs-SLATE-2019-18.pdf You have an example for contractions. But you don't have one for cilitics. For instance: "levem-no" should be "levar em o"?

I forked and am doing some changes. a major one is maintaining a relation between the original text and the processed one. i am returning an array of token objects, each token has its original line, position, and original text. Don't know what you think about this, it's a bit different from what you have, and it's also a bit of overhead maybe. Maybe it needs the possibility to have that or just a simple array.

Also changed the replace cilitics and contractions functions to classes. The functions are reading a document each time you call them. You only need to read it once and be able to call the function several times if you want.

I'll upload when it's working ok.

jdportugal commented 5 years ago

Hi,

I think that what you are adding seems very useful, specially in cases where you want to return part of the sentence (for example in relation extraction) let me know when its finished!