emanjavacas / pie

A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.
MIT License
22 stars 10 forks source link

Confusion Matrix #10

Closed PonteIneptique closed 5 years ago

PonteIneptique commented 5 years ago

Hey, I propose to implement some sort of confusion matrix in Pie. I think for us, as end users, it would be actually pretty useful to know where our data could be problematic or lacking information :)

emanjavacas commented 5 years ago

Hey, there is actually already stuff like that. When you call the evaluate method of a model, it gives you back a dict of {task-name: scorer}. Each scorer registers all prediction and if you pass the trainset to evaluate, it will also gather statistics from the trainset on unknown and ambiguous tokens, Then, you can use the methods, get_classification_summary and for lemmatization get_transduction_summary as well. The first gives you a summary of the most common errors in a confusion matrix fashion. The second gives you some kind of visual clue in terms of diffs. I will add methods to return the underlying data as a python dict or so, so that it can be serialized.

Let me know if you have questions/suggestions...

PonteIneptique commented 5 years ago

Wow. That's some good news. I think, when training, it would be nice to have it printed out in a table format, with an option such as - - confusion could export it to csv or whatever. Are you OK with me looking into it or do you want to do it?

Le lun. 24 déc. 2018 à 11:26 AM, Enrique Manjavacas < notifications@github.com> a écrit :

Hey, there is actually already stuff like that. When you call the evaluate method of a model, it gives you back a dict of {task-name: scorer}. Each scorer registers all prediction and if you pass the trainset to evaluate, it will also gather statistics from the trainset on unknown and ambiguous tokens, Then, you can use the methods, get_classification_summary and for lemmatization get_transduction_summary as well. The first gives you a summary of the most common errors in a confusion matrix fashion. The second gives you some kind of visual clue in terms of diffs. I will add methods to return the underlying data as a python dict or so, so that it can be serialized.

Let me know if you have questions/suggestions...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449715573, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZqgzhgNmMpC2fuIY_wPxeYLgRyUWks5u8KvhgaJpZM4ZgQPP .

emanjavacas commented 5 years ago

The problem with monitoring something like this is that you easily clutter up your terminal and actually I don't see what useful info you could get there in order to better diagnostic training. I think confusion matrices are important during error analysis of a model, but not really during optimization of the model. You could add the confusion output to the evaluate.py script.

PonteIneptique commented 5 years ago

I think you're right about the optimization of the model, but you could think about optimizing the training data, eg. if your model struggle at analyzing subjunctives, let's try to see if you can find texts which are loaded with it, because you might be lacking training example. You could also better understand what seems to be hard to recognize :)

emanjavacas commented 5 years ago

Yes, but that's something you want to check after the model has converged.

On Mon, 24 Dec 2018, 13:58 Thibault Clérice, notifications@github.com wrote:

I think you're right about the optimization of the model, but you could think about optimizing the training data, eg. if your model struggle at analyzing subjunctives, let's try to see if you can find texts which are loaded with it, because you might be lacking training example. You could also better understand what seems to be hard to recognize :)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449731291, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho6muVL-UATEWlSMgfIo59NOsB-uGks5u8M9sgaJpZM4ZgQPP .

PonteIneptique commented 5 years ago

Agreed. Maybe I should actually make it an other script rather than add it to train then.

Le lun. 24 déc. 2018 à 2:08 PM, Enrique Manjavacas notifications@github.com a écrit :

Yes, but that's something you want to check after the model has converged.

On Mon, 24 Dec 2018, 13:58 Thibault Clérice, notifications@github.com wrote:

I think you're right about the optimization of the model, but you could think about optimizing the training data, eg. if your model struggle at analyzing subjunctives, let's try to see if you can find texts which are loaded with it, because you might be lacking training example. You could also better understand what seems to be hard to recognize :)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449731291, or mute the thread < https://github.com/notifications/unsubscribe-auth/AF6Ho6muVL-UATEWlSMgfIo59NOsB-uGks5u8M9sgaJpZM4ZgQPP

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449732421, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZsL_FAlSgMV9AG9jo7ktkqXjTW32ks5u8NHpgaJpZM4ZgQPP .

emanjavacas commented 5 years ago

Yes, I think is best if you add it to evaluate. py

On Mon, 24 Dec 2018, 14:14 Thibault Clérice, notifications@github.com wrote:

Agreed. Maybe I should actually make it an other script rather than add it to train then.

Le lun. 24 déc. 2018 à 2:08 PM, Enrique Manjavacas < notifications@github.com> a écrit :

Yes, but that's something you want to check after the model has converged.

On Mon, 24 Dec 2018, 13:58 Thibault Clérice, notifications@github.com wrote:

I think you're right about the optimization of the model, but you could think about optimizing the training data, eg. if your model struggle at analyzing subjunctives, let's try to see if you can find texts which are loaded with it, because you might be lacking training example. You could also better understand what seems to be hard to recognize :)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449731291, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AF6Ho6muVL-UATEWlSMgfIo59NOsB-uGks5u8M9sgaJpZM4ZgQPP

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449732421, or mute the thread < https://github.com/notifications/unsubscribe-auth/AB1yZsL_FAlSgMV9AG9jo7ktkqXjTW32ks5u8NHpgaJpZM4ZgQPP

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/10#issuecomment-449732999, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6HowVcVD9o2l1hWwUE4KbOG_RbgBT2ks5u8NM4gaJpZM4ZgQPP .

emanjavacas commented 5 years ago

I am closing this