coli-saar / alto

Alto, the Algebraic Language Toolkit
Other
16 stars 2 forks source link

EM yields incorrect results #65

Closed alexanderkoller closed 3 years ago

alexanderkoller commented 3 years ago

Three tests regarding EM fail with the current version of Alto. @jgroschwitz could you check if this is caused by your recent refactoring of EM and fix it?

  <testcase name="testEmCD" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.018">
    <failure message="java.lang.AssertionError: expected 0.4285714286, got 0.5" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.4285714286, got 0.5

  <testcase name="testEmEF" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.022">
    <failure message="java.lang.AssertionError: expected 0.3, got 0.25" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.3, got 0.25

  <testcase name="testEmAbab" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.005">
    <failure message="java.lang.AssertionError: expected 0.5, got 0.3333333333333333" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.5, got 0.3333333333333333
jgroschwitz commented 3 years ago

I'll have a look

jgroschwitz commented 3 years ago

I just fixed it, was a silly thing. The EM implementation stops after a fixed amount of iterations or after likelihood improvements are under a threshold. The default implementation specifies 0 iterations with the intention that only the threshold should count. I had changed the underlying function such that the number of iterations must be negative to have that effect; i.e. it is now possible to run actually 0 iterations of EM. But I hadn't changed the default implementation, so all the tests ran literally 0 iterations of EM. I now changed the default implementation to use iterations=-1 and clarified the documentation. Sorry I left this around so long!