Closed unhammer closed 8 months ago
input to divvun-suggest:
"<beapmoeh>"
"beapmoe" N Sem/Food Pl Nom <W:0.0> <sma> &LINK &space-before-punct-mark ID:1
:
"<.>"
"." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> &space-before-punct-mark &no-space-after-punct-mark ID:2 R:RIGHT:4 R:LEFT:1
"." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> "<. (>" &no-space-after-punct-mark &SUGGESTWF ID:2 R:RIGHT:4 R:LEFT:1
"." CLB <W:0.0> <SpaceBeforePunctMark> <NoSpaceAfterPunctMark> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark> "<beapmoeh.>" &space-before-punct-mark &SUGGESTWF ID:2 R:RIGHT:4 R:LEFT:1
"<(>"
"(" PUNCT LEFT <W:0.0> <NoSpaceBeforeParenBeg> <sma> &LINK &no-space-after-punct-mark ID:4
:\n
heh I was just trying to figure out how to pull the suspect sentence out of gramcheck comparator, here's some valgrinds, it doesn't seem code I have worked on but not sure:
echo 'beapmoeh .('| hfst-tokenise -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/tokeniser-gramcheck-gt-desc.pmhfst' \
| src/divvun-blanktag '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/analyser-gt-whitespace.hfst' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/valency.bin' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/mwe-dis.bin' \
| cg-mwesplit \
| src/divvun-blanktag '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/analyser-gt-errorwhitespace.hfst' \
| src/divvun-cgspell -n 10 -b 15.000000 -w 5000.000000 -u 0.400000 -l '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/acceptor.default.hfst' -m '/home/flammie/github/giellalt/lang-sma/tools/grammar
checkers/errmodel.default.hfst' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/valency-postspell.bin' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/grc-disambiguator.bin' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/spellchecker.bin' \
| vislcg3 -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/grammarchecker.bin' \
> | valgrind src/divvun-suggest -j -g '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol' -m '/home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/errors.xml' -l sma
==427429== Memcheck, a memory error detector
==427429== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==427429== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==427429== Command: src/divvun-suggest -j -g /home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol -m /home/flammie/github/giellalt/lang-sma/tools/grammarcheckers/errors.xml -l sma
==427429==
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 8) > this->size() (which is 3)
==427429==
==427429== Process terminating with default action of signal 6 (SIGABRT): dumping core
==427429== at 0x9290D4C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==427429== by 0x9243571: raise (in /usr/lib64/libc.so.6)
==427429== by 0x922D4B1: abort (in /usr/lib64/libc.so.6)
==427429== by 0x8F45C47: __gnu_cxx::__verbose_terminate_handler() [clone .cold] (vterminate.cc:95)
==427429== by 0x8F58185: __cxxabiv1::__terminate(void (*)()) (eh_terminate.cc:48)
==427429== by 0x8F581F0: std::terminate() (eh_terminate.cc:58)
==427429== by 0x8F58431: __cxa_throw (eh_throw.cc:98)
==427429== by 0x8F49283: std::__throw_out_of_range_fmt(char const*, ...) [clone .cold] (functexcept.cc:101)
==427429== by 0x169BE0: std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >::_M_check(unsigned long, char const*) const (basic_string.h:390)
==427429== by 0x161F9B: std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >::substr(unsigned long, unsigned long) const (basic_string.h:3132)
==427429== by 0x1548F2: divvun::mk_repform(std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, std::map<std::pair<unsigned long, unsigned long>, std::pair<std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >, divvun::Reading>, std::less<std::pair<unsigned long, unsigned long> >, std::allocator<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >, divvun::Reading> > > >&) (suggest.cpp:441)
==427429== by 0x155222: divvun::proc_LEFT_RIGHT(divvun::Reading const&, std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, divvun::Sentence const&, std::__cxx11::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> > const&, unsigned long, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, divvun::Cohort const&) (suggest.cpp:512)
==427429==
==427429== HEAP SUMMARY:
==427429== in use at exit: 36,668,026 bytes in 5,031 blocks
==427429== total heap usage: 22,876 allocs, 17,845 frees, 80,903,618 bytes allocated
==427429==
==427429== LEAK SUMMARY:
==427429== definitely lost: 0 bytes in 0 blocks
==427429== indirectly lost: 0 bytes in 0 blocks
==427429== possibly lost: 2,704 bytes in 2 blocks
==427429== still reachable: 36,665,322 bytes in 5,029 blocks
==427429== of which reachable via heuristic:
==427429== stdstring : 93 bytes in 1 blocks
==427429== suppressed: 0 bytes in 0 blocks
==427429== Rerun with --leak-check=full to see details of leaked memory
==427429==
==427429== For lists of detected and suppressed errors, rerun with: -s
==427429== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Avbrutt (SIGABRT) (kjerne lagret i fil)
Yeah it's an issue in how I combine multiple overlapping suggestions. That code is complicated :-/
(I think I know how to make a real fix, just won't have time to focus until after holidays.)
in giella-sma 447d8e53db382f0a42a90f6aede37031867e326f
originally reported as core dump / abort trap from gramcheck_comparator.py (so it happens when used as python lib too):