biigle / core

:large_blue_circle: Application core of BIIGLE
https://biigle.de
GNU General Public License v3.0
12 stars 16 forks source link

FathomNet interface #443

Open mzur opened 2 years ago

mzur commented 2 years ago

We should explore the implementation of an interface to FathomNet. This interface should allow an easy export of (image) annotations from BIIGLE to FathomNet. As far as I can see this would only work for remote volumes, as FathomNet requires public URLs to the images. Otherwise an export should be rather straight forward. It just needs a convenient UI.

mzur commented 2 years ago

If/when FathomNet supports iFDOs, this should be even easier.

mzur commented 2 years ago

This article describes a CSV file format that can be used to upload annotations to FathomNet. We could provide this CSV as a new report type. Thoughts:

mzur commented 2 years ago

This issue could also handle the FathomNet->BIIGLE data transfer. I think the easiest way for this could be iFDO. As mentioned before, the BIIGLE->FathomNet data transfer could be easy as well if FathomNet would support an iFDO import.

hohonuuli commented 2 years ago

I'm inlining your ifdo sample here so I can find it again:

# General information about the dataset.
image-set-header:
    # UUID, name, handle are mandatory fields of each iFDO.
    image-set-uuid: 2a2360e9-a5ec-4ad2-be04-0ea0b4cbdc58
    image-set-name: SO268-1_21-1_GMR_CAM-23
    # Handles are a superset of DOIs and can be obtained with a Handle server.
    # See: http://handle.net/
    image-set-handle: 20.500.12085/2a2360e9-a5ec-4ad2-be04-0ea0b4cbdc58@data
    # Version must be specified but defaults to v1.0.0 if not present.
    image-set-ifdo-version: v1.0.0
    # List of labels (not annotations) that are used in this dataset.
    image-annotation-labels:
        # Label IDs should be universally unique if possible. It must be unique in this file.
        # The info field can be used for a URL, too.
        - id: urn:lsid:marinespecies.org:taxname:124731
          name: Kolga hyalina
          info: http://www.marinespecies.org/aphia.php?p=taxdetails&id=124731
    # List of persons who created annotations in this dataset.
    image-annotation-creators:
        # ORCID ID can be used as annotator ID, if available.
        - id: 0000-0002-7122-2343
          name: Martin Zurowietz
# List of images of this dataset (list keys are image filenames).
image-set-items:
    SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg:
        # List of annotations on this image.
        image-annotations:
            # Bounding box annotation: x1,y1,x2,y2,x3,y3,x4,y4
            - coordinates: [10,10,10,20,20,20,20,10]
              shape: rectangle
              # An annotation can have one or more labels. Label and annotator are referenced
              # by their ID.
              labels:
                  - label: urn:lsid:marinespecies.org:taxname:124731
                    annotator: 0000-0002-7122-2343
hohonuuli commented 2 years ago

Just a note that I tested URLs as a key instead of a file name using http://www.yamllint.com and it seems to be valid input. The above after yaml linting with a URL as the image-set-item key:

--- 
image-set-header: 
  image-annotation-creators: 
    - 
      id: 0000-0002-7122-2343
      name: "Martin Zurowietz"
  image-annotation-labels: 
    - 
      id: "urn:lsid:marinespecies.org:taxname:124731"
      info: "http://www.marinespecies.org/aphia.php?p=taxdetails&id=124731"
      name: "Kolga hyalina"
  image-set-handle: 20.500.12085/2a2360e9-a5ec-4ad2-be04-0ea0b4cbdc58@data
  image-set-ifdo-version: v1.0.0
  image-set-name: SO268-1_21-1_GMR_CAM-23
  image-set-uuid: 2a2360e9-a5ec-4ad2-be04-0ea0b4cbdc58
image-set-items: 
  ? "http://foo/bar/SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg"
  : 
    image-annotations: 
      - 
        coordinates: 
          - 10
          - 10
          - 10
          - 20
          - 20
          - 20
          - 20
          - 10
        labels: 
          - 
            annotator: 0000-0002-7122-2343
            label: "urn:lsid:marinespecies.org:taxname:124731"
        shape: rectangle
hohonuuli commented 2 years ago

Note that iFDO seems to expect a one-to-many relation between an image-set and it's contained images. FathomNet has a many-to-many relation (e.g an image can be reference by several collections in FathomNet). We need to be aware of this for FathomNet >> BIIGLE interchange.

hohonuuli commented 2 years ago

I wrote a proof-of-concept transformer to convert ifdo yaml to FathomNet csv at https://gist.github.com/hohonuuli/cd2204c5900b8c7e95559c35fdd3c0e7 . I haven't written a DarwinCore converter yet.

Just a thought about ifdo, I'm not a fan of using variables as keys (e.g. SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg: <stuff>). I think a better practice is to use fixed keys: (e.g. image-name: SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg) as this makes parsing and working with the data generally simpler. Is it too late to put this recommendation forward to Timm Schoening?

mzur commented 2 years ago

Just a note that I tested URLs as a key instead of a file name [...]

While this works and may be valid according to the (current) iFDO spec, BIIGLE expects all images of a (remote/public) collection to have a common URL prefix. The way we planned it is to use a (not yet documented) iFDO image-set-data-handle field to store the prefix and just use the image filename as the key for image-set-items. Example:

image-set-header:
    # ...
    image-set-data-handle: http://foo/bar
    # ...
image-set-items:
    SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg:
        # ...

Note that iFDO seems to expect a one-to-many relation between an image-set and it's contained images. FathomNet has a many-to-many relation [...]

One iFDO file is meant to store the information of one image collection. One image could be part of different collections but you would have different iFDO files for each of these. If required, an image can be identified by its UUID across collections. BIIGLE doesn't care about this. It would just duplicate the image for each collection. We planned to do this differently at some point but this change was never completed because of lack of time...

Just a thought about ifdo, I'm not a fan of using variables as keys [...]. I think a better practice is to use fixed keys [...]

Maybe it's just personal preference but IMHO the image filenames as keys prevent you from accidentally duplicating an image in the collection (which shouldn't be allowed) and save a little space.

Is it too late to put this recommendation forward to Timm Schoening?

No, the spec is discussed here. Please let me know in case you can't get access to the GitLab instance.

hohonuuli commented 2 years ago

@mzur Thanks for the clarifications!

iFDO image-set-data-handle field to store the prefix and just use the image filename as the key for image-set-items

That may create problems for data interchange as it expects all images in a collection to be in the same directory. That assumption isn't true for collections in FathomNet. That seems to be an arbitrary requirement in the ifdo standard that should probably be fixed/addressed.

the image filenames as keys prevent you from accidentally duplicating an image in the collection

That true, but there's tradeoffs in complexity. Here's trivial examples of parsing ifdo in JavaScript, showing that parsing/reaability suffers when filenames as keys:

Ifdo current style:

j = {
  'img1.png': { label: 'ItZF8GgzFMAs8WRk5', coordinates: 'tMnb1z1ooSUOuqEMS' },
  'img2.png': { label: 'HOhk142Q6brYmOMWC', coordinates: '7FQv9TVj6nOjxU2Ri' },
  'img3.png': { label: 'hJYbY33KOsMbJN2o6', coordinates: 'uW5zEkoosux9QsC44' }
}

for (const k of Object.keys(j)) { console.log(j[k].label) }

simpler style

This doesn't make ifdo consumers work as hard and plays better in statically typed languages.

j = [
  {name: 'img1.png', label: 'ItZF8GgzFMAs8WRk5', coordinates: 'tMnb1z1ooSUOuqEMS' },
  {name: 'img2.png', label: 'HOhk142Q6brYmOMWC', coordinates: '7FQv9TVj6nOjxU2Ri' },
  {name: 'img3.png', label: 'hJYbY33KOsMbJN2o6', coordinates: 'uW5zEkoosux9QsC44' }
]

for (const a of j) { console.log(a.label) }

This style would also better support using the full URL in the name:

image-set-header:
    # ...
image-set-items:
    - name: http://foo/bar/SO268-1_21-1_GMR_CAM-23_20190513_131416.jpg
hohonuuli commented 2 years ago

Please let me know in case you can't get access to the GitLab instance.

@mzur I can read the issues but can't make any comments without a login. Is it possible to get one? Thanks!

mzur commented 2 years ago

That may create problems for data interchange as it expects all images in a collection to be in the same directory. That assumption isn't true for collections in FathomNet. That seems to be an arbitrary requirement in the ifdo standard that should probably be fixed/addressed.

This should best be discussed in the iFDO GitLab repo. I'll write Timm and ask him how you can get access.

That true, but there's tradeoffs in complexity.

I'd choose "correctness by design" over ease of development :wink: But we can discuss this in the iFDO GitLab as well.