UniversalDataTool / collaboration-server

Collaboration Server for use with Universal Data Tool
5 stars 6 forks source link

summary mode implementation #6

Open seveibar opened 4 years ago

seveibar commented 4 years ago

In relation to RFC: Supporting Large Datasets. The collaboration server should minimize payloads by returning "sampleSummary" which contains enough information to display aggregate data on samples, but not enough information to view the sample. When the collaborative session is started in summarizeSamples mode, instead of returning udt_json that looks like this:

{
  "interface": { /* ... */ },
  "samples": [ /* ... */ ]
}

It returns a SummarizedUDTObject that looks like the following:

{
  "interface": { /* ... */ },
  "sampleSummary": [
    { "state": "complete", hasAnnotation: false, version: 32 }
    // ...
  ]
}

The summarizeSamples mode should be a column on the session table and a POST body parameter to POST /api/session.

Diffs should be run against sampleSummary instead of samples in summarizeSamples mode.

Ownmarc commented 4 years ago

Would you want a minimized version of each image in this sample summarization ? Could there be many levels of "summarization" ? What about using some kind of pagination like some database do ?

seveibar commented 4 years ago

With "sampleRange" view

{
  "interface": { /* ... */ },
  "summary": {
    "totalSamples": 0,
    "stateCounts": {
       "complete": 10
    },
    "sampleRange": [0, 50],
    "samples": [
      { "state": "complete", hasAnnotation: false, version: 32 }
      // ...
    ]
  ]
}

It should be possible to apply whole-json diffs intelligently against this type of object, while maintaining a small payload.

~If, however, we're looking at paginated views, we might as well just return the full samples:~

// BAD, probably will be too big on full image segmentation with image masks
{
  "interface": { /* ... */ },
  "summary": {
    "totalSamples": 0,
    "stateCounts": {
       "complete": 10
    },
  ],
  "sampleRange": [0, 50],
  "samples": [
      { /* full sample with imageUrl etc. */ }
  ]
}

The payload problem becomes a big problem with full pixel segmentation, we're noticing collaborative sessions with 200 samples go above 5mb and slow down everything.

seveibar commented 4 years ago

One argument to support summarized samples (i.e. summary.sampleRange with summary.samples) is that storing image masks will be expensive, and eventually only 1-5 samples can be in memory at any given time, but we still want to have the nice grid view containing an overview of 500 samples.

seveibar commented 4 years ago

I would not recommend variable summary levels for the first PR, though I think this will be easy to do after the foundation is in place.

seveibar commented 4 years ago

We're continuing to take a look at this:

This Summary Object is a good pick for the first version

{
  "interface": { /* ... */ },
  "summary": {
    "samples": [
      { "state": "complete", version: 32 }
      // ...
    ]
  }
}
{
  "interface": { /* ... */ },
  "summary": {
    "samples": [
      { "state": "complete", version: 32 }
      // ...
    ]
  }
}
seveibar commented 4 years ago

Untitled (1)