Closed keighrim closed 1 year ago
Some questions to @JinnyViboonlarp and @Angela-Lam regarding the progress so far;
Regarding label differences, the current mapping is {'PERSON':'person', 'ORG':'organization', 'FAC':'location', 'GPE':'location', 'LOC':'location'} They are pretty much 1-to-1 mapping except FAC (countries' names) which semantically could be either mapped to location or organization based on contexts.
The current code works with ".ann" format for both gold data and model's data, but the plan is for me to make it accept input in either .ann or .mmif (or .json representing a mmif file) format.
On Mon, Apr 24, 2023 at 4:54 PM Keigh Rim @.***> wrote:
Some questions to @JinnyViboonlarp https://github.com/JinnyViboonlarp and @Angela-Lam https://github.com/Angela-Lam regarding the progress so far;
- How do you to handle label sets differences in the gold data and spacy output (if any)?
- Which format do you assume for the "gold" input? Currently https://github.com/clamsproject/clams-aapb-annotations/tree/82c56a4c8a03ab62e7674600bfdf006092dfa932/golds/ner/2022-jun-namedentity/annotations there are MMIF files in the "gold" directory, but I heard @marcverhagen https://github.com/marcverhagen and @JinnyViboonlarp https://github.com/JinnyViboonlarp were discussing the formats last week.
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1520811609, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARDHMWURSPFNPDNMZ4I5MQTXC3SAXANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
To add to Jinny’s email, for the mmif format files, I also plan on having a 1:1 mapping between the predictions and space output. The gold input for this particular evaluation task is assumed to have mmif format, and I think a lot of the code than Jinny wrote can be reused here. I haven't been able to dig deep into it last week because I was a little swamped by finals T-T
On Mon, Apr 24, 2023 at 17:11 Jinny Viboonlarp @.***> wrote:
Regarding label differences, the current mapping is {'PERSON':'person', 'ORG':'organization', 'FAC':'location', 'GPE':'location', 'LOC':'location'} They are pretty much 1-to-1 mapping except FAC (countries' names) which semantically could be either mapped to location or organization based on contexts.
The current code works with ".ann" format for both gold data and model's data, but the plan is for me to make it accept input in either .ann or .mmif (or .json representing a mmif file) format.
On Mon, Apr 24, 2023 at 4:54 PM Keigh Rim @.***> wrote:
Some questions to @JinnyViboonlarp https://github.com/JinnyViboonlarp and @Angela-Lam https://github.com/Angela-Lam regarding the progress so far;
- How do you to handle label sets differences in the gold data and spacy output (if any)?
- Which format do you assume for the "gold" input? Currently < https://github.com/clamsproject/clams-aapb-annotations/tree/82c56a4c8a03ab62e7674600bfdf006092dfa932/golds/ner/2022-jun-namedentity/annotations
there are MMIF files in the "gold" directory, but I heard @marcverhagen https://github.com/marcverhagen and @JinnyViboonlarp https://github.com/JinnyViboonlarp were discussing the formats last week.
— Reply to this email directly, view it on GitHub < https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1520811609 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ARDHMWURSPFNPDNMZ4I5MQTXC3SAXANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1520832822, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLVQCQ6KMVLVNP3TASXZXDXC3UB3ANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
Dear Angela,
I just would like to give you a heads-up that I will most likely edit the evaluation code in a pretty significant way to make it more easily to check that the implementation is correct (after I’m too less swamped by finals), so now might not be the best time to dig deep into it.
On Tue, Apr 25, 2023 at 8:13 AM Angela-Lam @.***> wrote:
To add to Jinny’s email, for the mmif format files, I also plan on having a 1:1 mapping between the predictions and space output. The gold input for this particular evaluation task is assumed to have mmif format, and I think a lot of the code than Jinny wrote can be reused here. I haven't been able to dig deep into it last week because I was a little swamped by finals T-T
On Mon, Apr 24, 2023 at 17:11 Jinny Viboonlarp @.***> wrote:
Regarding label differences, the current mapping is {'PERSON':'person', 'ORG':'organization', 'FAC':'location', 'GPE':'location', 'LOC':'location'} They are pretty much 1-to-1 mapping except FAC (countries' names) which semantically could be either mapped to location or organization based on contexts.
The current code works with ".ann" format for both gold data and model's data, but the plan is for me to make it accept input in either .ann or .mmif (or .json representing a mmif file) format.
On Mon, Apr 24, 2023 at 4:54 PM Keigh Rim @.***> wrote:
Some questions to @JinnyViboonlarp <https://github.com/JinnyViboonlarp
and @Angela-Lam https://github.com/Angela-Lam regarding the progress so far;
- How do you to handle label sets differences in the gold data and spacy output (if any)?
- Which format do you assume for the "gold" input? Currently <
there are MMIF files in the "gold" directory, but I heard @marcverhagen https://github.com/marcverhagen and @JinnyViboonlarp https://github.com/JinnyViboonlarp were discussing the formats last week.
— Reply to this email directly, view it on GitHub <
https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1520811609
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/ARDHMWURSPFNPDNMZ4I5MQTXC3SAXANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub < https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1520832822 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/APLVQCQ6KMVLVNP3TASXZXDXC3UB3ANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1521684494, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARDHMWQJ7K2N2Q5HOX3WF33XC65TNANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
can you both upload whatever you have done so far to this repo? I guess I need to see what's holding you so long. See this (https://github.com/clamsproject/consumer-evaluation/issues/2#issuecomment-1496963867) to find out how to properly do a push.
Hi Keigh,
I just pushed my progress so far, sorry for the delay! I have one more assignment due tonight, and 3 more next Wednesday. After that, my schedule will open up a lot more. I can also meet earlier next week if that works better for you.
For this code, I still need to change the labels from the MMIF hierarchy to the entity labels for Spacy. I was able to run it before, but ran into weird and different bugs the past couple of times trying to run it. For the next steps, I'll write the label_choice based on the spacy NER labels, and convert the MMIF format to entity list.
Please let me know what you think.
Best regards, Angels
On Thu, Apr 27, 2023 at 8:20 AM Keigh Rim @.***> wrote:
can you both upload whatever you have done so far to this repo? I guess I need to see what's holding you so long. See this (#2 (comment) https://github.com/clamsproject/consumer-evaluation/issues/2#issuecomment-1496963867) to find out how to properly do a push.
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1525593290, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLVQCT4SBMIJPVGPZ3FWB3XDJP7TANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
Dear Angela,
If it is any relevant, I just uploaded a repo for a conversion between mmif format and .ann format: https://github.com/JinnyViboonlarp/ann-mmif-conversion
My plan of the next steps and your plan seems to overlap somewhat, so if I make any update, I'll let you know as soon as possible to avoid redundancy in our workflow.
Also,could you send me the link to your updated repo?
Thank you, Jinny
On Fri, Apr 28, 2023 at 12:07 PM Angela-Lam @.***> wrote:
Hi Keigh,
I just pushed my progress so far, sorry for the delay! I have one more assignment due tonight, and 3 more next Wednesday. After that, my schedule will open up a lot more. I can also meet earlier next week if that works better for you.
For this code, I still need to change the labels from the MMIF hierarchy to the entity labels for Spacy. I was able to run it before, but ran into weird and different bugs the past couple of times trying to run it. For the next steps, I'll write the label_choice based on the spacy NER labels, and convert the MMIF format to entity list.
Please let me know what you think.
Best regards, Angels
On Thu, Apr 27, 2023 at 8:20 AM Keigh Rim @.***> wrote:
can you both upload whatever you have done so far to this repo? I guess I need to see what's holding you so long. See this (#2 (comment) < https://github.com/clamsproject/consumer-evaluation/issues/2#issuecomment-1496963867 ) to find out how to properly do a push.
— Reply to this email directly, view it on GitHub < https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1525593290 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/APLVQCT4SBMIJPVGPZ3FWB3XDJP7TANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1527780557, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARDHMWTYIFVUM2ITNATIA2LXDPTH7ANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
Hi Jinny,
That's awesome, thank you so much for the helpful information! Here's the link to my code: https://github.com/clamsproject/consumer-evaluation/pull/5/files
Best regards, Angela
On Fri, Apr 28, 2023 at 12:44 PM Jinny Viboonlarp @.***> wrote:
Dear Angela,
If it is any relevant, I just uploaded a repo for a conversion between mmif format and .ann format: https://github.com/JinnyViboonlarp/ann-mmif-conversion
My plan of the next steps and your plan seems to overlap somewhat, so if I make any update, I'll let you know as soon as possible to avoid redundancy in our workflow.
Also,could you send me the link to your updated repo?
Thank you, Jinny
On Fri, Apr 28, 2023 at 12:07 PM Angela-Lam @.***> wrote:
Hi Keigh,
I just pushed my progress so far, sorry for the delay! I have one more assignment due tonight, and 3 more next Wednesday. After that, my schedule will open up a lot more. I can also meet earlier next week if that works better for you.
For this code, I still need to change the labels from the MMIF hierarchy to the entity labels for Spacy. I was able to run it before, but ran into weird and different bugs the past couple of times trying to run it. For the next steps, I'll write the label_choice based on the spacy NER labels, and convert the MMIF format to entity list.
Please let me know what you think.
Best regards, Angels
On Thu, Apr 27, 2023 at 8:20 AM Keigh Rim @.***> wrote:
can you both upload whatever you have done so far to this repo? I guess I need to see what's holding you so long. See this (#2 (comment) <
https://github.com/clamsproject/consumer-evaluation/issues/2#issuecomment-1496963867
) to find out how to properly do a push.
— Reply to this email directly, view it on GitHub <
https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1525593290
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/APLVQCT4SBMIJPVGPZ3FWB3XDJP7TANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub < https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1527780557 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ARDHMWTYIFVUM2ITNATIA2LXDPTH7ANCNFSM6AAAAAAWMRA4XY
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/clamsproject/consumer-evaluation/issues/1#issuecomment-1527820512, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLVQCSNURJF5HBATXYTZKLXDPXXJANCNFSM6AAAAAAWMRA4XY . You are receiving this because you were mentioned.Message ID: @.***>
@JinnyViboonlarp Hi, we talked about making the evaluation code to work with multiple gold-prediction pairs and return an "aggregated" result. I took your latest code from your personal repo and merged to main
via #5 into https://github.com/clamsproject/aapb-evaluations/blob/main/ner_eval/evaluate.py . Can you continue updating the code in this repository? Thanks!
done via #12 for now.
A thread to report and discuss progress on the NER evaluation pipeline development.