alan-turing-institute / scivision

scivision: a framework for scientific image analysis
https://sci.vision/
BSD 3-Clause "New" or "Revised" License
94 stars 39 forks source link

switch to using `intake` catalogs for data sources #37

Closed quantumjot closed 2 years ago

ots22 commented 2 years ago
acocac commented 2 years ago

Example for Plankton

Note urlpath should be replaced by the full path of the directory in GDrive. The following catalog consists of two entries i) single image and ii) stack i.e. concatanate multiple images to a common image shape e.g. 256 x 256 pixels:

%%writefile catalog.yaml
sources:
  plankton_single:
      description: Load a single labeled images from CEFAS zooplankton dataset
      origin: 
      driver: intake_xarray.image.ImageSource
      parameters:
        species:
          description: which species to collect
          type: str
          default: Bivalvia-Larvae
        id:
          description: which filenmae
          type: str
          default: Pia1.2017-10-03.1726+N00296780_hc
      args:
        urlpath: '/content/gdrive/.../{{species}}/{{id}}.tif'
        storage_options: {'anon': True}
  plankton_all:
      description: Labeled images from CEFAS zooplankton dataset
      origin: 
      driver: intake_xarray.image.ImageSource
      args:
        urlpath: '/content/gdrive/.../{species}/{id}.tif'
        storage_options: {'anon': True}
        concat_dim: [id, species]
        coerce_shape: [256, 256]
      metadata:
        shape: images_shape_all
acocac commented 2 years ago

Additional to the plankton example, here are three examples from the Environmental AI Book contributors using intake for cataloguing files in different formats:

I hope the above examples are useful to understand how intake could be beneficial for cataloguing and handling different formats for scivision.

edwardchalstrey1 commented 2 years ago

@acocac can this issue be closed?

acocac commented 2 years ago

yep, let me close it.