Closed justusschock closed 4 years ago
Hi! thanks for your contribution!, great first issue!
I like the structure...
Dividing only into research areas would mean duplication of some metrics, for example accuracy is used more or less within all fields. I think it would be better to mainly divide into a regression and classification subpackage depending on targets begin continues or discrete. Specific metrics (like BLEU in NLP) could be in research-specific subpackeges.
I agree, and these metrics (like accuracy) would not fall in any of these but remain in the base package.
I don't want to divide them into regression and classification and also have subpackages for all the research areas as it may become non-trivial where to find the desired metric.
Another thing we could think of is not having subpackages at all, but just one metric package containing them all (just like torch.nn
)
Dividing only into research areas would mean duplication of some metrics, for example accuracy is used more or less within all fields. I think it would be better to mainly divide into a regression and classification subpackage depending on targets begin continues or discrete. Specific metrics (like BLEU in NLP) could be in research-specific subpackeges.
I would rather avoid deep metric structures, one level is enough... So we can have general purpose like accuracy and the domain specific =) And then make all imported from root metric init...
cv: panoptic quality and IOU Augmentation: affinity and diversity
Some metrics may even be dataset-specific, e.g., F1 score for SQuAD (there are some preprocessing and special rules involved). For these kind of less general metrics, I think there should be a base Metric
class for people to inherited on and create their own.
For some reference, this is how I implement mine, and this is from PyTorch Ignite.
Also, should losses be considered as some type of metrics?
@haotongye
I would say, that we shouldn't include dataset specific metrics here.
But I agree, we should have a base class metric (probably just a torch.nn.Module
with some extras). This will, however, be hard for the functional interface.
For now, I wouldn't include losses, as this would really broaden the scope. Maybe we can do this afterwards in a separate effort.
@seandatasci Can you link a paper or reference implementation for the affinity and diversity part? AFAIK there are several ways to calculate these...
some requested:
metrics for continues output:
would be nice to have:
however, as far as I know the last 3 require access to the full list of targets and predictions at ones, so they can only be used for smaller datasets.
As I mentioned in the tweet for NLG this repo can directly be integrated (?)
If planning to also include support for Vision&Language tasks such as VQA/Visdial etc. which are proposed mostly as discriminative tasks, R@{1,5,10} / MRR/ NDCG can also be used. One nice implementation by Pythia here.
Let me know if I can help! Thanks.
@shubhamagarwal92 We probably will have to adjust the metrics for NLG according to our upcoming metrics interface, but other then that it should be fine. If you want to, you can take this, once we have our interface running (probably tomorrow).
@justusschock https://arxiv.org/abs/2002.08973
Let me know if I can help! Thanks.
Help is always welcome =)
As discussed in #973 , we will probably start by implementing metrics as standalones.
This issue aims to discuss on what metrics we need and how we can implement this in a package structure.
Suggestions welcome.
My initial thought was to have a metrics package with subpackages for each research area like vision, text, audio etc.
CC @srush @Borda @williamFalcon @Darktex
As a start: For vision I'd like to have the following: