fathomnet / community-feedback

1 stars 0 forks source link

Constrain annotations to avoid typos #24

Open hohonuuli opened 2 years ago

hohonuuli commented 2 years ago

ala VARS style. When folks enter a term it needs to be constrained to valid input. This requires a naming service. Now that we have both MBARI's VARS KB and a fast WoRMS name server, we can constrain the terms using whatever provider the user selects. Still need a way to allow a user to enter unconstrained terms (This was Karen Osborn's suggest)

ermbutler commented 7 months ago

Relates to solution for #128

kevinsbarnard commented 7 months ago

Important to tackle before more data contribution; let's discuss more

hohonuuli commented 2 months ago

(Pasting a bit of relevant discussion with @ermbutler from Slack)

To do a VARS-style auto-complete, we do NOT combine the scientific and common names. We let the user use common name, but once they commit/save it, the app automatically changes it to the scientific name. To emulate that maybe the best road is to require to the user to type a few characters (maybe at least 3). Then use the query/contains endpoint to get a list of potential matches. Once the user selects one, use the synonyms endpoint to get the accepted/scientific name. So for “starfish” the calls are:

  1. https://fathomnet.org/worms/query/contains/starfish
  2. https://fathomnet.org/worms/synonyms/starfish

The first result in the synonym list is the accepted name.

Anyway, I’m happy to chat with you about but I think combining the terms into Asteroidea (starfish) is confusing/cluttering for a user.

Also, I wouldn’t use the taxa/info endpoint to resolve scientific names. WoRMS isn’t 1:many. It’s actually (sort-of) many-to-many. As an example if you use https://fathomnet.org/worms/taxa/info/Loligo%20opalescens it’s actually returning info about a former scientific name. But if you use https://fathomnet.org/worms/synonyms/Loligo%20opalescens, it will correctly resolve the accepted name to Doryteuthis opalescens.

hohonuuli commented 1 month ago

@lauravchrobak is collecting notes here

lauravchrobak commented 1 month ago

Summary

FathomNet concepts require data quality checks to ensure proper formatting for ML training. Below is a summary of various data quality checks for images and annotations and a more detailed document can be found here. Ideally these checks are to be performed both during data upload and periodically after ingestion to maintain data integrity.

Top priority

  1. Constrain concept format to worms/mbari knowledgeable
  2. Bounding box validity