UniversalDataTool / universal-data-tool

Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
https://universaldatatool.com
MIT License
1.96k stars 190 forks source link

Time Series Annotation #242

Open seveibar opened 4 years ago

david-waterworth commented 4 years ago

To provide some context, this thread discusses a number of time series annotation tools - both open source and commercial.

I evaluated most of them, in the end for the simple task I was assigned I ended up using matplotlib, specifically, the SpanSelector widget along with RadioButtons was good enough.

In general, there are two types of time series annotations that I'm aware of, assigning labels to a specific event - i.e. the peak of heart beat - i.e. a sequence of (timestamp, label) and assigning labels to a range i.e. which activity is being performed (start,end,label).

I was specifically looking for something which handled multivariate time-series (i.e. acceleration xyz, magnetic field xyz etc). With the option to display all on the same subplot or multiple plots.

Some applications only have a single label, others have multiple labels.

A more advanced scenario involves the user being presented with a video of the activity with play/pause and say half speed mode. You need to be able to create a sync point so the timeseries and video are in step. To apply a label you'd probably want to click "on" at the start frame and "off" and the end frame.

davodogster commented 4 years ago

Hi @seveibar @

Time series annotations tools (for generating data for deep learning) aren't quite good enough yet so this might be a big opportunity for you. A good example of an existing annotation tool is github user 'Dubrzr' with their repository called 'SignalAnnotation'. I have complex electrical signals and as a user I want to annotate segments and/or points, by clicking. A good user interface for fast labeling would allow things like clicking and dragging the series left to right, shortcut keys, zooming into areas, dynamically adjusting y axis as you move along the series, an export annotations button, and also an upload annotations button would be super helpful because I could generate noisy annotations semi automatically with a preexisting functions/code and then refine them in your new time series tool :) Then i'll train an LSTM for segmentation,peak detection,classification model. Another tool out there is called 'TRAINSET' but it's too basic.

and yes @david-waterworth a video stream of the data, with the labels being in sync would be useful aswell.

Cheers and look forward to your thoughts! Sam

dean-sh commented 4 years ago

Hey guys, I'd like to add few use cases and features we'd be happy to see in UDT regarding time series labeling. We are searching for a time series labeling tool that can help us in our robotics, agriculture and IOT ML projects.

Core functionality

  1. Multivariate time series
    • [Optional] Mark one series as a reference.
    • Visualizing all series on one chart or stacked charts.
    • [Optional] Possibility to show/hide series
  2. Support for multi-labels (more than 1 category)
  3. Point based labeling, range based labels, and whole-series classification
  4. Marking labeled timestamps in different color - not just time ranges (similar to trainset)

Nice to have

  1. Editable colors for labels
  2. Custom axis labels (or adding a variable in addition to the timestamp)
  3. Azure blob storage support for data import and labels export.

I'd be happy to discuss this further with you. Cheers, Dean.

seveibar commented 4 years ago

Hi everyone, I'm closing in on the time series interface, I was wondering if anyone could review the JSON format? Am I forgetting any use cases? https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/time_series.md

david-waterworth commented 4 years ago

@seveibar

My use case has 3 files each containing 3 series (also 3 timestamp columns consisting of elapsed, epoch and string), along with a video. It would be nice if at a minimum you could support multiple aligned series in both the "timeData" and "csvUrl" elements (i.e. timestamp,value1,value2...valuen).

Are you saying "samples" will either contain "timeData", "csvUrl", "audioUrl" or "videoUrl"? I think you'd want at least both either "timeData" or "csvUrl" and optionally one of "audioUrl" or "videoUrl"? I think it makes sense to allow either "timeData" or "csvUrl" to appear in "samples" and "audioUrl" or "videoUrl" outside it but not quite sure?

When annotating a time series based on a video, there's usually a mechanism to mark a sync point - i.e. for a video provide a (frame,timestamp) pair which then allows you to map from another frame back to a timestamp.

Perhaps you might want to allow chart kwargs - series colour, alpha etc?

Also one thing I had to do on a recent project was to allow the annotator to zoom in - what I observed was users were turning the device on, then spending several minutes getting ready for a short experiment. So the time series was 90% nothing. I'm not sure that needs to be in the schema but it could be optional to supply initial zoom start/end values.

seveibar commented 4 years ago

@david-waterworth I totally forgot about aligned time series, I'll fix that ASAP, the draft GUI supports it!

Also w.r.t. to zoom region, totally agreed. Although this may create some data redundancy if data is repeated across samples with different zoom regions.

davodogster commented 4 years ago

@seveibar Really looking forward to using your new time series labelling tool! My data is 2 aligned time series series with millivolt recordings in milliseconds - 1 to 600,000 (no dates). I want to have the option of annotating points and also segments, multiple classes. Export as csv is ideal, with the output of input data plus an extra column for the label, else export as JSON if csv isn't possible. The ability to upload partial annotations and delete and add new ones to that would be a great feature aswell. Cheers, Sam

dean-sh commented 4 years ago

A few more things:

seveibar commented 4 years ago

I've updated the schema to support (I think) all of the use cases! Thanks for the feedback everybody!

A couple terminology notes:

Some other notes:

The latest version of the spec is here but I've pasted below to keep everything in one thread.

{
  "interface": {
    "type": "time_series",

    // time_format determines how the time axis will be displayed to the user
    // "dates": Display as dates with time
    // "none": Display time as a number. For example, if each data point was taken at a new iteration
    //         or over a short period of time
    // "duration": Display everything relative to the first data point but converted to a time. This
                   is how a video or audio editing application might display time
                   e.g. "1:20:00" to mean "1 hour and 20 minutes past the start"
    "timeFormat": "dates",

    "enabledTools": ["create-durations", "label-durations", "create-timestamps", "label-timestamps"],

    // Can the user manually type a new label? (free text)
    "allowCustomLabels": true,

    // Labels that can be used for durations
    "durationLabels": ["buy during this time", "sell during this time"],

    // Labels that can be used for timestamps
    "timestampLabels": ["earnings call starts", "CEO is ousted"],

    // OPTIONAL: If provided, you can layer or stack graphs
    "graphs": [ { "keyName": "value" } ] // default

    /*
    // Here's an example where we put two pieces of data on the same plot

    "graphs": [
      // if two graphs share the same row, they'll be placed on top of eachother
      // if a row isn't provided, the data corresponding to the key will get it's own row
      { "keyName": "val1", "row": 0 },
      { "keyName": "val2", "row": 0 }
    ]
    */
  },
  "samples": [
    {
      "timeData": [
        { "time": 0, "value": 100 },
        { "time": 1000, "value": 50 },
        //...

        // You can graph any "keyName" from the "graphs" array here
        { "time": 0, "val1": 0, "val2": 0 },
        { "time": 1000, "val2": 10 },
        { "time": 2000, "val1": 100 },
        { "time": 5000, "val1": 100, "val2": 100 }
      ],

      // This will appear in the sample after labeling, can also be provided for viewing data
      // Times will be in the same format as the "timeData", e.g. unix epoch milliseconds
      "annotation": {
        "durations": [
          { "start": 0, "end": 500, "label": "buy here" }
        ],
        "timestamps": [
          { "time": 1000, "label": "label for 1 second mark"  }
        ]
      }
    },
    // These are also valid
    { "audioUrl": "http://example.com/audio.mp3" },
    { "videoUrl": "http://example.com/video.mp4" },
    { "csvUrl": "http://example.com/csv_with_time_and_value_columns.csv" },
    { "audioUrl": "http://example.com/audio.mp3", "initialWindow": ["10s", "30s"] },
  ]
}

Screenshot to give a sense of the initial UI

image

seveibar commented 4 years ago

This is now released! Try it out and please create new issues for feedback and ideas!!

davodogster commented 4 years ago

BIG alright! Awesome will try it out! Hopefully I can upload partial annotations and modify them and then export as csv or json.

edit: Okay looking at the Scheme above it appears we can upload annotations. Do we need to write a python script to get our data into a dictionary for every single observation?

davodogster commented 3 years ago

Hi @seveibar Happy belated New Year mate! Hope you are well How would I do this kind of time series segmentation (or point) annotation using UDT?

image

Also, it would be very useful if the y axis dynamically adjusted to the min-max values of the current window. And the user could also zoom in our out to make the current window length larger, and slide along the series and annotate it as they wish. I'm also interested in point annotations, not just segment.

Best Regards, Sam

seveibar commented 3 years ago

Hey Sam, I think that's already possible, even with the features you described! Its a bit tricky right now to import the data however, you have to put it in thr UDT format JSON OR in a compatible csv. How is the data stored? Lets try loading it in!

davodogster commented 3 years ago

Hi @seveibar I just uninstalled the Desktop version for an older version, now downloaded a newer version. Can't remember how to install it. There are so many files and non of them seem like obvious .exe / Application files . I'm on windows.

EDIT: Oh it turns out the .exe doesn't exist for some of the newer versions so there is no .exe to download. Will install a less new version