robustness issues on commands: ngram_stats , coreset , qualitative_sample , clip_recall & content_recall

bardout commented 1 year ago

I found VDTK quite usefull as implementation all semantic metrics at the dataset level, and used it to compare my dataset for Remote Sensing with previous, as RSICD and RSITMD. However there are a few robustess issues in VDTK. Often a simple test would allow to continue computing with partial results. I have only attempted correction for some of these. Did somebody also encounter issues and modified this code ?

Environment

$uname -a
Linux minds01.irtse-pf.ext 5.11.0-41-generic #45~20.04.1-Ubuntu SMP Wed Nov 10 10:20:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# conda install follows : vdtk_min.yaml

name: geo2
channels:
  - conda-forge
  - defaults

dependencies:
- pip=21.2.4
- pip:
  - flake8-black>=0.3.6
  - tensorflow==2.12.0
  - sentence-transformers==2.2.2
  - bert-score==0.3.13
  - spacy==3.5.2
  - vdtk==0.3.0
  - rich==13.3.4
  - python-levenshtein

Observations

I coded a simple converter to COCO format from the format used in RSICD , RSITMD and my own set. These datasets are attached in zip, directly usable with vdtk.

[ ] ngram_stats fails on RSITMD in env geo2: _ ngram_stats() got an unexpected keyword argument 'referencekey' click/core.py line 760, in invoke return __callback(*args, **kwargs)
[ ] coreset fails all sets: division by zero_ core_set.py line 114, in coreset table.add_row(f"{f:.2f}", str(s), f"{s * 100 /len(test_data):.2f}%")
[ ] sample: on geotruth (OK RSICD, RSITMD) max() arg is an empty sequence geotruth: qualitative_sample.py line 86, in qualitative_sample best_bleu_mean_caption = max(mabs.items(), key=lambda x: x[1])`
[ ] clip_recall on RSICD: _No such file or directory: 'airport1.jpg' in clip_recall.py line 36, in _get_feature Image.open(media_path) . Also on RSITMD for 'baseballfield_452.tif', geotruth for 'tile_535000-6245200.tif'
- workaround: cd to the image directory for this set OK
[ ] clip_recall on RSICD : zero-size array to reduction operation maximum which has no identity inside clip_recall.py line 163, np.amax(i) raised by numpy.

[ ] Content Recall : all nan . Why ?

                                        Content Recall
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Dataset                    ┃ Noun Recall ┃ Verb Recall ┃ Noun Recall (Fuzzy) ┃ Verb Recall (Fuzzy) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ dataset_rsicd_v2_vdtk.json │   nan ± nan │   nan ± nan │           nan ± nan │           nan ± nan │
└────────────────────────────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┘

References:

RSICD Optimal
RSICD & model metrics.zip

DavidMChan commented 1 year ago

Thanks! Sorry for not noticing these issues, since I haven't been monitoring the repo (I guess my watch settings were off). I'll try to take a look at some of this after the EMNLP deadline on Friday. That being said, this repo is definitely in need of a bit of TLC, since it's been a while since I've touched it.

DavidMChan commented 1 year ago

I've just pushed a new commit that resolves several of these issues.

_ngramstats fails on RSITMD in env geo2 This was caused by a missing command line argument. Fixed.

coreset fails all sets: division by zero coreset.py This is caused when the test split has length 0 (as is the case in this dataset), you can get stats by explicitly changing the split that the metric is running on (for example, changing to the train spit). I've fixed this bug, however, so it doesn't error, and instead reports 0s.

sample: on geotruth (OK RSICD, RSITMD) max() arg is an empty sequence This happens when there is no data to sample from. I've added checks which error earlier in this case.

_clip_recall on RSICD: No such file or directory: 'airport1.jpg' This happens when the paths in the dataset aren't relative to the local directory. You can use the command line argument --media-root to adjust the root directory for the images.

_cliprecall on RSICD : zero-size array to reduction operation maximum This happens because your datasets have no "candidates" i.e. predictions for the references

Content Recall : all nan . Why ? This happens because your datasets have no "candidates" i.e. predictions for the references

bardout commented 1 year ago

Bonjour David,

Thank you first for your reply, and your interests in these issues (Also for issue 3 about zombie processes). I found the application of semantic metrics to Dataset only in this place, and the tool is therefore very useful. I have not verified how the corrections and recommendations apply to our data. As soon as metrics will be completed, these will be checked.

Cordially, Yves De : David Chan @.> Envoyé : mardi 27 juin 2023 22:38 À : CannyLab/vdtk @.> Cc : BARDOUT Yves @.>; Author @.> Objet : Re: [CannyLab/vdtk] robustness issues on commands: ngram_stats , coreset , qualitative_sample , clip_recall & content_recall (Issue #4)

I've just pushed a new commit that resolves several of these issues.

ngram_stats fails on RSITMD in env geo2 This was caused by a missing command line argument. Fixed.

coreset fails all sets: division by zero coreset.py This is caused when the test split has length 0 (as is the case in this dataset), you can get stats by explicitly changing the split that the metric is running on (for example, changing to the train spit). I've fixed this bug, however, so it doesn't error, and instead reports 0s.

sample: on geotruth (OK RSICD, RSITMD) max() arg is an empty sequence This happens when there is no data to sample from. I've added checks which error earlier in this case.

clip_recall on RSICD: No such file or directory: 'airport_1.jpg' This happens when the paths in the dataset aren't relative to the local directory. You can use the command line argument --media-root to adjust the root directory for the images.

clip_recall on RSICD : zero-size array to reduction operation maximum This happens because your datasets have no "candidates" i.e. predictions for the references

Content Recall : all nan . Why ? This happens because your datasets have no "candidates" i.e. predictions for the references

— Reply to this email directly, view it on GitHubhttps://github.com/CannyLab/vdtk/issues/4#issuecomment-1610178401, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACAWCFL5SOMSRCHXPQSRLRTXNNACDANCNFSM6AAAAAAZHVSSNI. You are receiving this because you authored the thread.Message ID: @.**@.>>

CannyLab / vdtk