Rounding of threshold can be a bit weird

PonteIneptique commented 4 years ago

Hi there, I have just seen a weird situation, which probably goes down to rounding stuff up:

Epoch 20

pos

	accuracy	precision	recall	support
all	0.9769	0.9287	0.9086	4147
unknown-tokens	0.9198	0.8135	0.8348	187
ambiguous-tokens	0.9376	0.9008	0.8695	930

<TaskScheduler patience="5" factor="0.5" threshold="0" min_weight="0">
    <Task name="pos" steps="0" patience="6" threshold="0.001" target="True" 
            mode="max" weight="1.0" best="0.9769"/>
</TaskScheduler>
<LrScheduler lr="0.00056" lr_steps="0" lr_patience="2"/>

Epoch 22

	accuracy	precision	recall	support
all	0.9776	0.929	0.9231	4147
unknown-tokens	0.9144	0.7081	0.7366	187
ambiguous-tokens	0.9409	0.9097	0.8829	930

<TaskScheduler patience="5" factor="0.5" threshold="0" min_weight="0">
    <Task name="pos" steps="2" patience="6" threshold="0.001" target="True" mode="max" 
            weight="1.0" best="0.9769"/>
</TaskScheduler>
<LrScheduler lr="0.00042" lr_steps="2" lr_patience="2"/>

Bug ?

I post it here for later review, but I feel like if this rounds up, 0.9776 should still beat 0.9769 (978 vs 977) but who knows :)

emanjavacas commented 4 years ago

This must be the effect of threshold. We are checking whether the new metric is better than a previous one by at least the given threshold. If not the new score gets ignored (and thus it doesn't register). Perhaps a default of 0.001 is too strict.

PonteIneptique commented 4 years ago

Thanks ! I think you are right.

emanjavacas / pie