divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

Crashes on sma input 'beapmoeh (' #66

Closed unhammer closed 8 months ago

unhammer commented 11 months ago

in giella-sma 447d8e53db382f0a42a90f6aede37031867e326f

$ cd tools/grammarcheckers 

$ make dev

$ sed 's,divvun-suggest,& -j,' modes/smagram.mode > modes/smagram-j.mode                                                                                                                                                                                                          

$ echo 'beapmoeh .('| modes/smagram-j.mode                                                                                                                                                                                                                                        
terminate called after throwing an instance of 'std::out_of_range'                                                                                                                                                                                                                
  what():  basic_string::substr: __pos (which is 8) > this->size() (which is 3)                                                                                                                                                                                                   
Aborted 

originally reported as core dump / abort trap from gramcheck_comparator.py (so it happens when used as python lib too):

$ LD_LIBRARY_PATH="$HOME/libdivvun/src/.libs" PYTHONPATH="$HOME/libdivvun/python/build/lib.linux-x86_64-3.10" ../giella-core/scripts/gramcheck_comparator.py ../lang-sma/tools/grammarcheckers/sma.zcheck goldstandard/converted/
Aborted.
unhammer commented 11 months ago

input to divvun-suggest:

"<beapmoeh>"
        "beapmoe" N Sem/Food Pl Nom <W:0.0> <sma> &LINK &space-before-punct-mark ID:1
: 
"<.>"
        "." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> &space-before-punct-mark &no-space-after-punct-mark ID:2 R:RIGHT:4 R:LEFT:1
        "." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> "<. (>" &no-space-after-punct-mark &SUGGESTWF ID:2 R:RIGHT:4 R:LEFT:1
        "." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> "<beapmoeh.>" &space-before-punct-mark &SUGGESTWF ID:2 R:RIGHT:4 R:LEFT:1

"<(>"
        "(" PUNCT LEFT <W:0.0> <NoSpaceBeforeParenBeg> <sma> &LINK &no-space-after-punct-mark ID:4
:\n
flammie commented 11 months ago

heh I was just trying to figure out how to pull the suspect sentence out of gramcheck comparator, here's some valgrinds, it doesn't seem code I have worked on but not sure:

echo 'beapmoeh .('| hfst-tokenise -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/tokeniser-gramcheck-gt-desc.pmhfst' \
 | src/divvun-blanktag '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/analyser-gt-whitespace.hfst' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/valency.bin' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/mwe-dis.bin' \
 | cg-mwesplit \
 | src/divvun-blanktag '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/analyser-gt-errorwhitespace.hfst' \
 | src/divvun-cgspell -n 10 -b 15.000000 -w 5000.000000 -u 0.400000 -l '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/acceptor.default.hfst' -m '/home/flammie/github/giellalt/lang-sma/tools/grammar
checkers/errmodel.default.hfst' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/valency-postspell.bin' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/grc-disambiguator.bin' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/spellchecker.bin' \
 | vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/grammarchecker.bin' \
> | valgrind src/divvun-suggest -j -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol' -m '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/errors.xml' -l sma 
==427429== Memcheck, a memory error detector
==427429== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==427429== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==427429== Command: src/divvun-suggest -j -g /home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol -m /home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/errors.xml -l sma
==427429== 
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 8) > this->size() (which is 3)
==427429== 
==427429== Process terminating with default action of signal 6 (SIGABRT): dumping core
==427429==    at 0x9290D4C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==427429==    by 0x9243571: raise (in /usr/lib64/libc.so.6)
==427429==    by 0x922D4B1: abort (in /usr/lib64/libc.so.6)
==427429==    by 0x8F45C47: __gnu_cxx::__verbose_terminate_handler() [clone .cold] (vterminate.cc:95)
==427429==    by 0x8F58185: __cxxabiv1::__terminate(void (*)()) (eh_terminate.cc:48)
==427429==    by 0x8F581F0: std::terminate() (eh_terminate.cc:58)
==427429==    by 0x8F58431: __cxa_throw (eh_throw.cc:98)
==427429==    by 0x8F49283: std::__throw_out_of_range_fmt(char const*, ...) [clone .cold] (functexcept.cc:101)
==427429==    by 0x169BE0: std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >::_M_check(unsigned long, char const*) const (basic_string.h:390)
==427429==    by 0x161F9B: std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >::substr(unsigned long, unsigned long) const (basic_string.h:3132)
==427429==    by 0x1548F2: divvun::mk_repform(std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, std::map<std::pair<unsigned long, unsigned long>, std::pair<std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >, divvun::Reading>, std::less<std::pair<unsigned long, unsigned long> >, std::allocator<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >, divvun::Reading> > > >&) (suggest.cpp:441)
==427429==    by 0x155222: divvun::proc_LEFT_RIGHT(divvun::Reading const&, std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, divvun::Sentence const&, std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, divvun::Cohort const&) (suggest.cpp:512)
==427429== 
==427429== HEAP SUMMARY:
==427429==     in use at exit: 36,668,026 bytes in 5,031 blocks
==427429==   total heap usage: 22,876 allocs, 17,845 frees, 80,903,618 bytes allocated
==427429== 
==427429== LEAK SUMMARY:
==427429==    definitely lost: 0 bytes in 0 blocks
==427429==    indirectly lost: 0 bytes in 0 blocks
==427429==      possibly lost: 2,704 bytes in 2 blocks
==427429==    still reachable: 36,665,322 bytes in 5,029 blocks
==427429==                       of which reachable via heuristic:
==427429==                         stdstring          : 93 bytes in 1 blocks
==427429==         suppressed: 0 bytes in 0 blocks
==427429== Rerun with --leak-check=full to see details of leaked memory
==427429== 
==427429== For lists of detected and suppressed errors, rerun with: -s
==427429== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Avbrutt (SIGABRT) (kjerne lagret i fil)
unhammer commented 11 months ago

Yeah it's an issue in how I combine multiple overlapping suggestions. That code is complicated :-/

(I think I know how to make a real fix, just won't have time to focus until after holidays.)