bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

evaluation: type scores #11

Closed cainesap closed 6 years ago

cainesap commented 6 years ago

Hello,

We've noticed type and token scores are always the same. It seems to me that in evaluate.py there is a difference in how the two are calculated, with the type score based on these lines --

def _stringpos_typepos(stringpos):
    return [{pos for pos in line} for line in stringpos]

-- which is effectively the same as token matching, but by indices instead (correct me if I'm wrong).

I'd be very happy to implement a type scoring function if I can, as a pull request, but first wanted to check the intuition behind it (sorry, I did try googling for more info about it, but in vain). Firstly, we're talking word types (as opposed to tokens), right? So is it supposed to be a list comparison of the gold and hypothesised lexicons? e.g. out of hypotheses {a, b, c, d} how many are also found in the gold lexicon {a, b, e} .. p=2/4, r=2/3

cheers, Andrew

cainesap commented 6 years ago

I've edited 'evaluate.py' with type scoring, as I understand it.

It works, as far as I can tell. But please tell me if I've misunderstood type scoring, or if this edit is in fact unwanted.

https://www.dropbox.com/s/59kooi8bvlod8pg/evaluate.py?dl=0

I've written comments prefixed "AC" where I've added new lines of code, and not deleted any pre-existing lines of code, just commented them out where necessary.

Andrew

mmmaat commented 6 years ago

Thank's Andrew, can you make a pull request please? So that we can discuss it, test it, etc...

cainesap commented 6 years ago

Yes sure, I wanted to do this but afaik I have to be made a collaborator to make a pull request? Or I can fork a copy of the repo and push my changes to that ..? (sorry, not done pull requests before)

mmmaat commented 6 years ago

Yes the idea is that you work in your own fork, or better in a dedicated branch of your fork. (no need to be a collaborator). This is a bit scary the first time but it is quite convenient to collaborate on a project!

https://help.github.com/articles/creating-a-pull-request/ https://help.github.com/articles/about-pull-requests/

If you can't handle it I'll do it tomorrow.

cainesap commented 6 years ago

Ok thank you Mathieu .. I've done that, and I think I've created a pull request. Hopefully you were notified

mmmaat commented 6 years ago

See discussion in PR #14

cainesap commented 6 years ago

Ok thank you Mathieu

mmmaat commented 6 years ago

Hi all, before to merge the changes proposed by @cainesap, I want to be sure the evaluation code is correct. Here are some simple examples, please give your feedback on the results...

I also welcome other toy tests if you have better ideas, thanks!

gold = 'the dog bites the dog'

mmmaat commented 6 years ago

Few more tests (all seems OK, I'm merging the changes)