UniversalDataTool / universal-data-tool

Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
https://universaldatatool.com
MIT License
1.95k stars 190 forks source link

RFC: Supporting Large Datasets, Multiple Backends and Improving Cloud Usability #124

Open seveibar opened 4 years ago

seveibar commented 4 years ago

Acknowledgements

Motivation

Overview

P1: Support URL Protocols requiring authentication

Support datasets with S3 URIs and attempt to authenticate and get a usable https/http URL.

{
  "imageUrl": "s3://mybucket/path/to/file.png"
}

P2: Support Custom Loading Mechanism with new UDT Reference File Style

{
  "interface": { /* ... */ },
  "sampleLoader": {
    "type": "s3",
    "s3Path": "s3://mybucket/subdir",
    // implies existence of...
    // s3://mybucket/subdir/data
    // s3://mybucket/subdir/annotations
  }
}

P3: Support Asynchronous Retrieval of UDT Data

import {
  useInterface,     // [loading, interface, changeInterface]
  useSample,        // [loading, sample, changeSample]
  useMetaData,      // {loading, numberOfSamples, numberOfSamplesCompleted, sampleIds}
  useImportSamples, // async (samples) => null
  useDeleteSamples  // async (sampleIdsOrIndices) => null
} from "../hooks/use-udt-reference"

const AsyncUniversalDataViewer = ({ sampleIndexOrId, udtRef }) => {
    const [loadingInterface, interface] = useInterface(udtRef)
    const [loadingSample, sample] = useSample(udtRef, sampleIndexOrId)

    if (loadingInterface||loadingSample) return "loading..."

    return (
      <UniversalDataViewer
        sample={sample}
        interface={interface}
      />
    )
}
seveibar commented 4 years ago

P3 Modifications

1. No Array Returns

Let's just not use array returns:

// BAD
const [loadingInterface, interface] = useInterface(udtRef)

// GOOD
const {loadingInterface, interface} = useInterface(udtRef)

2. useSampleSummary

I think useSampleSummary should be another method that returns a clone of samples with less information, such that grid views can be populated, e.g.

const {loadingSampleSummary, sampleSummary} = useSampleSummary()
// returns: [
//   { color: "#ff0000", hasAnnotation: false },
//   ...
// ]

3. No udtRef

Using udtRef everywhere (which btw, since the refactoring should be datasetRef) is unnecessary since there is only ever one dataset loaded. So just nix it and have it provided by context or whatever.

4. useFullDataset

const { loadingDataset, dataset, datasetLoadingProgress } = useFullDataset()
// loads ALL samples into dataset, this would be used before downloading