LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School
Other
222 stars 216 forks source link

incorrect number of mistakes in structured perceptron code #25

Closed estambolieva closed 2 months ago

estambolieva commented 10 years ago

not quite sure, but should we not increment the num_mistakes variable at line 56 in the sequences/structured_perceptron.py code by adding the line num_minstakes +=1?

2 minutes later, never mind, I figured it out :). closed.

miguelbalmeida commented 10 years ago

I am reopening this because what comes next is related. The number of mistakes is correctly computed, but the code is very confusing. Me, Luis Sarmento and Zita also had trouble figuring this out.

In day 3, the code for structured perceptron deals with the initial position at two separate places: first it updates the initial weights, then it processes it in the loop over all positions, where mistakes are counted and transition and emission weights are updated.

This led at least four monitors to confusion, so let's make sure each position is processed only in one place. Use one of the following two structures:

Process position 0, update initial and emission weights Process positions 1 ... N-1 (FOR loop), update emission and transition weights Process position N, update final, emission and transition weights

or

Process positions 0 ... N (FOR loop) if position is 0, process initial and emission weights if position is N, process final, emission and transition weights else, process emission and transition weights

Thanks to Katia and Luis Sarmento for bringing up the issue.

andre-martins commented 10 years ago

Not sure I agree the proposed solutions are better. There also basically two kinds of features here:

The main for loop takes care of the emissions (and counting the mistakes), hence its range.

Perhaps the most clear way is to separate the transition and emission updates into two for loops (albeit this would be a bit slower). The code would become:

1) Make emission updates.

for i in xrange(N) ...

2) Make transition updates.

... # Initial updates. for i in xrange(1,N) ... ... # Final updates.

2014-07-25 14:43 GMT+01:00 Miguel Almeida notifications@github.com:

I am reopening this because what comes next is related. The number of mistakes is correctly computed, but the code is very confusing. Me, Luis Sarmento and Zita also had trouble figuring this out.

In day 3, the code for structured perceptron deals with the initial position at two separate places: first it updates the initial weights, then it processes it in the loop over all positions, where mistakes are counted and transition and emission weights are updated.

This led at least four monitors to confusion, so let's make sure each position is processed only in one place. Use one of the following two structures: Process position 0, update initial and emission weights Process positions 1 ... N-1 (FOR loop), update emission and transition weights Process position N, update final, emission and transition weights

or Process positions 0 ... N (FOR loop) if position is 0, process initial and emission weights if position is N, process final, emission and transition weights else, process emission and transition weights

Thanks to Katia and Luis Sarmento for bringing up the issue.

— Reply to this email directly or view it on GitHub https://github.com/gracaninja/lxmls-toolkit/issues/25#issuecomment-50150625 .

miguelbalmeida commented 10 years ago

Fine by me, all three possibilities seem better than the current state.