Closed morethanbooks closed 7 years ago
That is not a bug, it is intentional.
FreeLing assumes that puntcuations are encoded in ascii characters. For dashes, that is "-", or "--", or even "---".
There are many other Unicode symbols for dashes, quotes, etc (e.g. see http://www.fileformat.info/info/unicode/category/Pd/list.htm) and it would be a nightmare to try to recognize them all.
You can customize your FreeLing installation by adding the required symbols to the punctuation definition file. It is located in data/common/punct.dat (in the source tarball) or in /usr/local/share/freeling/common/punct.dat after installation. See user manual for "punctuation" module to find out more about the format of the file (though it is quite straightforward, and you probably need only to copy the line for the ascii dash and replace with your own
If that does not work, you always can preprocess your texts replacing unicode dashes with ascii dashes.
Hi, perfect, I have edited the file (it was exactly there), I have tested and it works perfect. Many thanks for the answer. It was a great decision to work with FreeLing: great tool. Best regards, José Calvo
Hi, we are using Freeling for annotating Spanish novels and we have found a bug. The POS analyser does analyse correctly a sentence like: "-Estamos desorientados -murmuró el hombre tranquilamente-; nos hemos debido de perder."
In this case FreeLing says that "Estamos" is a verb. But if instead of hyphen you have any kind of dashes, it says that "Estamos" is a proper name (when using the NEC, it says that it is a person):
—Estamos desorientados —murmuró el hombre tranquilamente—; nos hemos debido de perder.