HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.41k stars 2.32k forks source link

Handling missing data in time series #695

Open gamblard opened 3 years ago

gamblard commented 3 years ago

Hello !

I'd like to request a feature for time series in Label Studio : I'm labelling multivariate time series and sometimes there can be some null values in the data (when a sensor did not transmit any data at a given timestamp for example). The issue is that by default Label Studio display this missing value as a 0, thus abruptly breaking the trend of the time serie. For example below is an example with 20% missing values : image

It would be perfect if we had an option to choose how we want null values to be shown : as zeroes or skipped entirely. To illustrate this, I've found that Google Data Studio proposes this feature, described in the section "missing data" here : https://support.google.com/datastudio/answer/7059697?hl=en. Currently, Label Studio behaves as the "Line to Zero" option, and the one I'm looking for is the "Linear Interpolation" one ("Line Breaks" would be fine too).

Thanks !

sjw9 commented 1 year ago

Having the same issue in Label Studio 1.5, and this is a pretty big problem for us.

First, going to zero causes scaling issues on the y-axis.

Second, zero is semantically very different than "missing." Interpolating would solve the scale issue, but is also very semantically different from "missing." Showing the data as missing would be ideal - if we want to fill blanks, we could do that ourselves when we export to CSV.

AndyYSWoo commented 1 year ago

hey guys is there any progress on this feature or is it on the near future roadmap? @makseq

or wondering @gamblard @sjw9 have you figured out a workaround?

makseq commented 1 year ago

This feature is not yet implemented, however I've created a ticket about it [LSDV-4725].

dvictori commented 1 year ago

Not sure if it's a related issue but I'm also having some trouble dealing with missing data on a CSV time series. If we put "NA" or "NULL" in the CSV time series, the overview graph will not plot after the missing period.

image

Also, the main graph (middle part) will only plot up to the missing period. After that period, it will only plot if I zoom to a time period without any missing data.

image

How should I treat missing values in time series? Delete the data points (ex.: jump from 2017-03-03 to 2022-04-01)? Use "NaN"? "NULL"?

Thanks

Zahorack commented 10 months ago

+1 for this feature

avishapiro commented 1 month ago

The issue still seems present in v1.13.0. Any updates @makseq ?