divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

Valgrind some memory issue #43

Closed unhammer closed 3 years ago

unhammer commented 3 years ago

valgrind-issue.txt http://codepad.org/gug9gJzb https://apertium.projectjj.com/apt/logs/libdivvun/hirsute-amd64.log

unhammer commented 3 years ago

This seems like the issue is actually in locatefy in hfst:

Conditional jump or move depends on uninitialised value(s)
   at 0x53726C3: hfst_ol::PmatchAlphabet::locatefy(unsigned int, hfst_ol::WeightedDoubleTape const&) (in /usr/lib/x86_64-linux-gnu/libhfst.so.53.0.0)
   by 0x53798CE: hfst_ol::PmatchContainer::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libhfst.so.53.0.0)
   by 0x537A084: hfst_ol::PmatchContainer::locate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double, float) (in /usr/lib/x86_64-linux-gnu/libhfst.so.53.0.0)
   by 0x538F599: hfst_ol_tokenize::match_and_print(hfst_ol::PmatchContainer&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, hfst_ol_tokenize::TokenizeSettings const&) (in /usr/lib/x86_64-linux-gnu/libhfst.so.53.0.0)
   by 0x538FCE2: hfst_ol_tokenize::process_input(hfst_ol::PmatchContainer&, std::istream&, std::ostream&, hfst_ol_tokenize::TokenizeSettings const&) (in /usr/lib/x86_64-linux-gnu/libhfst.so.53.0.0)
   by 0x4895BA6: divvun::Pipeline::proc(std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) (pipeline.cpp:345)
   by 0x11CF4D: run(divvun::Pipeline&) (main_checker.cpp:82)
   by 0x11D40E: main::{lambda(divvun::Pipeline&)#2}::operator()(divvun::Pipeline&) const [clone .isra.0] (main_checker.cpp:190)
   by 0x11DFCB: _ZN6mapbox4util7variantIJiN6divvun8PipelineEEE5matchIJZ4mainEUliE_Z4mainEUlRS3_E0_EEEDTcldtdefpTsrS4_5visitdefpTcl12make_visitorspcl7forwardIT_Efp_EEEEDpOS9_.isra.0 (variant.hpp:916)
   by 0x11AF3E: main (main_checker.cpp:182)
TinoDidriksen commented 3 years ago

With debug symbols:

Conditional jump or move depends on uninitialised value(s)
   at 0x52B735E: hfst_ol::PmatchAlphabet::locatefy(unsigned int, hfst_ol::WeightedDoubleTape const&) (pmatch.cc:1164)
   by 0x52BAC61: hfst_ol::PmatchContainer::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (pmatch.cc:938)
   by 0x52BB256: hfst_ol::PmatchContainer::locate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double, float) (pmatch.cc:997)
   by 0x52D259D: hfst_ol_tokenize::match_and_print(hfst_ol::PmatchContainer&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, hfst_ol_tokenize::TokenizeSettings const&) (pmatch_tokenize.cc:822)
   by 0x52D2CBA: hfst_ol_tokenize::process_input(hfst_ol::PmatchContainer&, std::istream&, std::ostream&, hfst_ol_tokenize::TokenizeSettings const&) (pmatch_tokenize.cc:859)
   by 0x488ACF8: divvun::TokenizeCmd::run(std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) const (pipeline.cpp:53)
   by 0x488DEAA: divvun::Pipeline::proc(std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) (pipeline.cpp:401)
   by 0x11A55D: run(divvun::Pipeline&) (main_checker.cpp:82)
   by 0x11B44A: operator() (main_checker.cpp:215)
   by 0x11B44A: mapbox::util::detail::dispatcher<mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>, mapbox::util::variant<int, divvun>, int, divvun>::apply(mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>&, {lambda(divvun::Pipeline&)#4}&&) (variant.hpp:358)
   by 0x11B482: mapbox::util::detail::dispatcher<mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>, mapbox::util::variant<int, divvun>, int, int, divvun>::apply(mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>&, {lambda(divvun::Pipeline&)#4}&&) (variant.hpp:343)
   by 0x11B48D: decltype (mapbox::util::detail::dispatcher<mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>, mapbox::util::variant<int, divvun::Pipeline>, int, int, divvun::Pipeline>::apply({parm#1}, (forward<mapbox::util::detail>)({parm#2}))) mapbox::util::variant<int, divvun::Pipeline>::visit<mapbox::util::visitor<main::{lambda(int)#3}, main::{lambda(divvun::Pipeline&)#4}>, mapbox::util::variant<int, divvun::Pipeline>, int>(mapbox::util::detail::dispatcher&, mapbox::util::detail&&) (variant.hpp:884)
   by 0x11B4BF: _ZN6mapbox4util7variantIJiN6divvun8PipelineEEE5matchIJZ4mainEUliE1_Z4mainEUlRS3_E2_EEEDTcldtdefpTsrS4_5visitdefpTcl12make_visitorspcl7forwardIT_Efp_EEEEDpOS9_ (variant.hpp:916)
TinoDidriksen commented 3 years ago

"Fixed" in HFST. The way libdivvun uses HFST means it never reaches https://github.com/hfst/hfst/blob/master/libhfst/src/implementations/optimized-lookup/pmatch.cc#L35-L37 so there is no input_mark_symbol to compare against on https://github.com/hfst/hfst/blob/master/libhfst/src/implementations/optimized-lookup/pmatch.cc#L1164

Does this mean libdivvun generates incorrect FSTs?

unhammer commented 3 years ago

I think some FST's simply don't have input marks (they're only used when we have ambiguous tokenisation). I'll close this then, thanks for looking into it :)