Closed vprecup closed 2 years ago
Hi & thanks for this!
I agree the change makes sense, to make the metric more fairly comparable between datasets where the proportion of zero-entity examples is different.
Any div/0 risk would seem to require a user to provide a validation dataset with 0 examples of any entities - in which case things breaking would hopefully be obvious, and maybe even beneficial to help user notice the issue.
I also like surfacing the number of "focus" examples to the user via the extra metric, which may be useful in some cases for people trying to understand how accuracy is interacting with the propensity of the model to predict all-"other", to produce the final score.
Was going to ask why not n_focus_examples = (n_focus_tokens_by_example != 0).sum()
, but from a quick timeit
test on a small dummy array, seems like your slice-and-shape method is a fair bit faster? 🤯
So all looks good to me & happy to merge 😁
Excellent! Thanks for the feedback, @athewsey! Also, thanks for providing this super nice solution. I've learned a lot about the SageMaker & HuggingFace ecosystems thanks to it.
Description of changes: While working with the solution and delving into the details of how the
focus_acc
metric is computed for model validation, I realised that there are situations when examples do not contain any focus tokens, but are still "captured" in this metric.Although these examples are excluded from the focus token sums (i.e.
focus_acc_by_example
- line 451 fromner.py
), when the focus_acc_by_exampleis averaged into
focus_acc, the total number of validation examples is used (
n_examples = probs_raw.shape[0]) instead of the
n_focus_tokens_by_example` elements that are not 0.By nature of the
focus_acc
, I understand that this metric only accounts for non-default labelled or predicted tokens, hence I concluded that it should not take into consideration examples where there are no such tokens. So I am proposing this change. @athewsey, I look forward to finding out your POV on this.Additionally, the PR also introduces the
n_focus_examples
metric that will be captured in CloudWatch.Testing done: Solution execution with and without the change - analysis of the metrics via CloudWatch.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.