LanguageMachines / frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
https://languagemachines.github.io/frog
GNU General Public License v3.0
73 stars 11 forks source link

Frog NER fails with folia::NoSuchAnnotation #57

Closed proycon closed 6 years ago

proycon commented 6 years ago

When running on mlp09:

frog -c /vol/tensusers/proycon/nederlab/nederlab-linguistic-enrichment/resources/frog-crmgys-cgn/frog.cfg --override tokenizer.rulesFile=tokconfig-nld-historical --threads 1 --nostdout -x /scratch/proycon/Corpus-Middelnederlands-1-1_08f16391-7a88-4b37-b967-4f398e604972.folia.xml

NER fails with the following:

frog-:Tue Jul 17 14:35:28 2018 process 721 sentences
frog-:Tue Jul 17 14:35:28 2018 done with sentence[1]
terminate called after throwing an instance of 'folia::NoSuchAnnotation'
  what():  no such annotation: entities for set='http://ilk.uvt.nl/folia/sets/frog-ner-nl'

Program received signal SIGABRT, Aborted.
0x00002aaaaca29428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00002aaaaca29428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00002aaaaca2b02a in __GI_abort () at abort.c:89
#2  0x00002aaaac2ce84d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00002aaaac2cc6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00002aaaac2cc701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00002aaaaadc738c in folia::EntitiesLayer::EntitiesLayer (d=<optimized out>, a=std::map with 2 elements = {...}, this=0x3f32e480, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /vol/customopt/lamachine16.dev/include/libfolia/folia_impl.h:2441
#6  addEntity (sent=sent@entry=0x4127f810, tagset="http://ilk.uvt.nl/folia/sets/frog-ner-nl", words=std::vector of length 1, capacity 1 = {...}, confs=std::vector of length 1, capacity 1 = {...}, NER="per", textclass="current") at ner_tagger_mod.cxx:257
#7  0x00002aaaaadc7989 in NERTagger::addNERTags (this=this@entry=0x274902b0, words=std::vector of length 47, capacity 64 = {...}, tags=std::vector of length 47, capacity 64 = {...}, confs=std::vector of length 47, capacity 64 = {...})
    at ner_tagger_mod.cxx:314
#8  0x00002aaaaadc87b3 in NERTagger::post_process (this=this@entry=0x274902b0, swords=std::vector of length 47, capacity 64 = {...}, override=std::vector of length 47, capacity 47 = {...}) at ner_tagger_mod.cxx:519
#9  0x00002aaaaadca393 in NERTagger::Classify (this=0x274902b0, swords=std::vector of length 47, capacity 64 = {...}) at ner_tagger_mod.cxx:410
#10 0x00002aaaaad477f1 in FrogAPI::TestSentence () at FrogAPI.cxx:577
#11 0x00002aaaad5b212f in GOMP_parallel_sections () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#12 0x00002aaaaad47ba5 in FrogAPI::TestSentence (this=this@entry=0x7fffffffcb10, sent=<optimized out>, timers=...) at FrogAPI.cxx:567
#13 0x00002aaaaad48161 in FrogAPI::FrogDoc (this=this@entry=0x7fffffffcb10, doc=..., hidetimers=hidetimers@entry=false) at FrogAPI.cxx:1383
#14 0x00002aaaaad56dab in FrogAPI::FrogFile (this=this@entry=0x7fffffffcb10, infilename="input/Corpus-Middelnederlands-1-1_08f16391-7a88-4b37-b967-4f398e604972.folia.xml", os=..., xmlOutF="") at FrogAPI.cxx:1500
#15 0x00000000004057a7 in main (argc=<optimized out>, argv=<optimized out>) at Frog.cxx:631

Frog with regular model also fails, but without the addNERtags error:

frog-:Tue Jul 17 14:42:04 2018 process 721 sentences
frog-:Tue Jul 17 14:42:04 2018 done with sentence[1]
terminate called without an active exception

Thread 1 "frog" received signal SIGABRT, Aborted.
0x00002aaaaca29428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00002aaaaca29428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00002aaaaca2b02a in __GI_abort () at abort.c:89
#2  0x00002aaaac2ce84d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00002aaaac2cc6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00002aaaac2cc701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00002aaaaad9a3c4 in folia::AbstractAnnotationLayer::~AbstractAnnotationLayer (__vtt_parm=<optimized out>, this=<optimized out>, __in_chrg=<optimized out>) at /vol/customopt/lamachine16.dev/include/libfolia/folia_impl.h:2088
#6  folia::EntitiesLayer::EntitiesLayer (d=<optimized out>, a=std::map with 2 elements = {...}, this=0x16d27320, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at /vol/customopt/lamachine16.dev/include/libfolia/folia_impl.h:2441
#7  mwuAna::addEntity (this=0x16d16390, tagset="http://ilk.uvt.nl/folia/sets/frog-mwu-nl", textclass="current", sent=sent@entry=0x17857880, el=<optimized out>) at mwu_chunker_mod.cxx:78
#8  0x00002aaaaad9a555 in Mwu::Classify (this=0x2aaad00008c0, words=...) at mwu_chunker_mod.cxx:248
#9  0x00002aaaaad47851 in FrogAPI::TestSentence () at FrogAPI.cxx:605
#10 0x00002aaaad5b212f in GOMP_parallel_sections () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#11 0x00002aaaaad47ba5 in FrogAPI::TestSentence (this=this@entry=0x7fffffffcbb0, sent=<optimized out>, timers=...) at FrogAPI.cxx:567
#12 0x00002aaaaad48161 in FrogAPI::FrogDoc (this=this@entry=0x7fffffffcbb0, doc=..., hidetimers=hidetimers@entry=false) at FrogAPI.cxx:1383
#13 0x00002aaaaad56dab in FrogAPI::FrogFile (this=this@entry=0x7fffffffcbb0, infilename="inp

Something fishy is going on...

kosloot commented 6 years ago

Ok, I am on holidays, but...

The error seems to suggest that the entities declaration is missing from the folia document. Which is odd, as it is added whenever the NER is involved. (or should be)

Without knowing which input files are used, debugging will be hard, so provide more details, so I can look into this after the holidays...

kosloot commented 6 years ago

I managed to find your test file :) It was a chain of problems, as usual:

I solved it in GIT now. (not generating an id, when impossible).

NOTE: the folia file looks very fishy to me:

kosloot commented 6 years ago

Well, closing this, while the bug is fixed, and the messy FoLiA is not my problem :)