Time Series Annotation - Githubissues

david-waterworth commented 4 years ago

To provide some context, this thread discusses a number of time series annotation tools - both open source and commercial.

I evaluated most of them, in the end for the simple task I was assigned I ended up using matplotlib, specifically, the SpanSelector widget along with RadioButtons was good enough.

In general, there are two types of time series annotations that I'm aware of, assigning labels to a specific event - i.e. the peak of heart beat - i.e. a sequence of (timestamp, label) and assigning labels to a range i.e. which activity is being performed (start,end,label).

I was specifically looking for something which handled multivariate time-series (i.e. acceleration xyz, magnetic field xyz etc). With the option to display all on the same subplot or multiple plots.

Some applications only have a single label, others have multiple labels.

A more advanced scenario involves the user being presented with a video of the activity with play/pause and say half speed mode. You need to be able to create a sync point so the timeseries and video are in step. To apply a label you'd probably want to click "on" at the start frame and "off" and the end frame.

davodogster commented 4 years ago

Hi @seveibar @

Time series annotations tools (for generating data for deep learning) aren't quite good enough yet so this might be a big opportunity for you. A good example of an existing annotation tool is github user 'Dubrzr' with their repository called 'SignalAnnotation'. I have complex electrical signals and as a user I want to annotate segments and/or points, by clicking. A good user interface for fast labeling would allow things like clicking and dragging the series left to right, shortcut keys, zooming into areas, dynamically adjusting y axis as you move along the series, an export annotations button, and also an upload annotations button would be super helpful because I could generate noisy annotations semi automatically with a preexisting functions/code and then refine them in your new time series tool :) Then i'll train an LSTM for segmentation,peak detection,classification model. Another tool out there is called 'TRAINSET' but it's too basic.

and yes @david-waterworth a video stream of the data, with the labels being in sync would be useful aswell.

Cheers and look forward to your thoughts! Sam

dean-sh commented 4 years ago

Hey guys, I'd like to add few use cases and features we'd be happy to see in UDT regarding time series labeling. We are searching for a time series labeling tool that can help us in our robotics, agriculture and IOT ML projects.

Core functionality

Multivariate time series
- [Optional] Mark one series as a reference.
- Visualizing all series on one chart or stacked charts.
- [Optional] Possibility to show/hide series
Support for multi-labels (more than 1 category)
Point based labeling, range based labels, and whole-series classification
Marking labeled timestamps in different color - not just time ranges (similar to trainset)

Nice to have

Editable colors for labels
Custom axis labels (or adding a variable in addition to the timestamp)
Azure blob storage support for data import and labels export.

I'd be happy to discuss this further with you. Cheers, Dean.

seveibar commented 4 years ago

Hi everyone, I'm closing in on the time series interface, I was wondering if anyone could review the JSON format? Am I forgetting any use cases? https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/time_series.md

david-waterworth commented 4 years ago

@seveibar

My use case has 3 files each containing 3 series (also 3 timestamp columns consisting of elapsed, epoch and string), along with a video. It would be nice if at a minimum you could support multiple aligned series in both the "timeData" and "csvUrl" elements (i.e. timestamp,value1,value2...valuen).

Are you saying "samples" will either contain "timeData", "csvUrl", "audioUrl" or "videoUrl"? I think you'd want at least both either "timeData" or "csvUrl" and optionally one of "audioUrl" or "videoUrl"? I think it makes sense to allow either "timeData" or "csvUrl" to appear in "samples" and "audioUrl" or "videoUrl" outside it but not quite sure?

When annotating a time series based on a video, there's usually a mechanism to mark a sync point - i.e. for a video provide a (frame,timestamp) pair which then allows you to map from another frame back to a timestamp.

Perhaps you might want to allow chart kwargs - series colour, alpha etc?

Also one thing I had to do on a recent project was to allow the annotator to zoom in - what I observed was users were turning the device on, then spending several minutes getting ready for a short experiment. So the time series was 90% nothing. I'm not sure that needs to be in the schema but it could be optional to supply initial zoom start/end values.

seveibar commented 4 years ago

@david-waterworth I totally forgot about aligned time series, I'll fix that ASAP, the draft GUI supports it!

Also w.r.t. to zoom region, totally agreed. Although this may create some data redundancy if data is repeated across samples with different zoom regions.

davodogster commented 4 years ago

@seveibar Really looking forward to using your new time series labelling tool! My data is 2 aligned time series series with millivolt recordings in milliseconds - 1 to 600,000 (no dates). I want to have the option of annotating points and also segments, multiple classes. Export as csv is ideal, with the output of input data plus an extra column for the label, else export as JSON if csv isn't possible. The ability to upload partial annotations and delete and add new ones to that would be a great feature aswell. Cheers, Sam

dean-sh commented 4 years ago

A few more things:

Support multiple datetime formats (unix timestamp, and datetime with an option for string formatting as in here)
I also support having the option to define the initial zoom level of the whole series.
Support custom text annotation (on mouse hover on the series) - for example, show values of another variable for that timestamp.

seveibar commented 4 years ago

I've updated the schema to support (I think) all of the use cases! Thanks for the feedback everybody!

A couple terminology notes:

Durations = Segments = Time Ranges = Label + Start Time + End Time
Timestamps = Flags = Label + Time
Time can be any number, date string or duration string (thanks @dean-sh for the link), though I expect unix datetime to be most common. We will need to find one or more full RFCs to make sure that we're not reinventing the wheel with our parsing of different formats.

Some other notes:

initialWindow in a sample defines the initial zoom
Custom text labels are possible w/ allowCustomLabels for both timestamps and durations
Overlapping classifications on a duration or timestamp are possible, but a duration doesn't currently support multi-classification as @davodogster suggested. I consider this a UI issue rather than a format issue (the UI should identity perfectly overlapping durations as being effectively a multi-classification item). This simplifies the standard somewhat but can achieve the same effect visually.
@david-waterworth mentioned a sync point. I admit this is a bit outside my domain expertise and I'll have to do some research on the best way to ensure synchronization with frames while maintaining an intuitive JSON interface. We may also run into some practical issues on the web without access to ffmpeg for extracting frames. In the interest of time I think I'll introduce this feature after the initial time_series drop. It's possible we'll just need to make it a separate interface e.g. "video_duration_label".

The latest version of the spec is here but I've pasted below to keep everything in one thread.

{
  "interface": {
    "type": "time_series",

    // time_format determines how the time axis will be displayed to the user
    // "dates": Display as dates with time
    // "none": Display time as a number. For example, if each data point was taken at a new iteration
    //         or over a short period of time
    // "duration": Display everything relative to the first data point but converted to a time. This
                   is how a video or audio editing application might display time
                   e.g. "1:20:00" to mean "1 hour and 20 minutes past the start"
    "timeFormat": "dates",

    "enabledTools": ["create-durations", "label-durations", "create-timestamps", "label-timestamps"],

    // Can the user manually type a new label? (free text)
    "allowCustomLabels": true,

    // Labels that can be used for durations
    "durationLabels": ["buy during this time", "sell during this time"],

    // Labels that can be used for timestamps
    "timestampLabels": ["earnings call starts", "CEO is ousted"],

    // OPTIONAL: If provided, you can layer or stack graphs
    "graphs": [ { "keyName": "value" } ] // default

    /*
    // Here's an example where we put two pieces of data on the same plot

    "graphs": [
      // if two graphs share the same row, they'll be placed on top of eachother
      // if a row isn't provided, the data corresponding to the key will get it's own row
      { "keyName": "val1", "row": 0 },
      { "keyName": "val2", "row": 0 }
    ]
    */
  },
  "samples": [
    {
      "timeData": [
        { "time": 0, "value": 100 },
        { "time": 1000, "value": 50 },
        //...

        // You can graph any "keyName" from the "graphs" array here
        { "time": 0, "val1": 0, "val2": 0 },
        { "time": 1000, "val2": 10 },
        { "time": 2000, "val1": 100 },
        { "time": 5000, "val1": 100, "val2": 100 }
      ],

      // This will appear in the sample after labeling, can also be provided for viewing data
      // Times will be in the same format as the "timeData", e.g. unix epoch milliseconds
      "annotation": {
        "durations": [
          { "start": 0, "end": 500, "label": "buy here" }
        ],
        "timestamps": [
          { "time": 1000, "label": "label for 1 second mark"  }
        ]
      }
    },
    // These are also valid
    { "audioUrl": "http://example.com/audio.mp3" },
    { "videoUrl": "http://example.com/video.mp4" },
    { "csvUrl": "http://example.com/csv_with_time_and_value_columns.csv" },
    { "audioUrl": "http://example.com/audio.mp3", "initialWindow": ["10s", "30s"] },
  ]
}

Screenshot to give a sense of the initial UI

seveibar commented 4 years ago

This is now released! Try it out and please create new issues for feedback and ideas!!

davodogster commented 4 years ago

BIG alright! Awesome will try it out! Hopefully I can upload partial annotations and modify them and then export as csv or json.

edit: Okay looking at the Scheme above it appears we can upload annotations. Do we need to write a python script to get our data into a dictionary for every single observation?

davodogster commented 3 years ago

Hi @seveibar Happy belated New Year mate! Hope you are well How would I do this kind of time series segmentation (or point) annotation using UDT?

Also, it would be very useful if the y axis dynamically adjusted to the min-max values of the current window. And the user could also zoom in our out to make the current window length larger, and slide along the series and annotate it as they wish. I'm also interested in point annotations, not just segment.

Best Regards, Sam

seveibar commented 3 years ago

Hey Sam, I think that's already possible, even with the features you described! Its a bit tricky right now to import the data however, you have to put it in thr UDT format JSON OR in a compatible csv. How is the data stored? Lets try loading it in!

davodogster commented 3 years ago

Hi @seveibar I just uninstalled the Desktop version for an older version, now downloaded a newer version. Can't remember how to install it. There are so many files and non of them seem like obvious .exe / Application files . I'm on windows.

EDIT: Oh it turns out the .exe doesn't exist for some of the newer versions so there is no .exe to download. Will install a less new version

UniversalDataTool / universal-data-tool

Time Series Annotation #242

Core functionality

Nice to have