DHI / tsod

Anomaly Detection for time series data
https://dhi.github.io/tsod
MIT License
159 stars 18 forks source link

Add benchmarking dataset with labelled anomalies for scoring performance of detector algorithms #12

Open halvgaard opened 3 years ago

halvgaard commented 3 years ago

Do you know about any (open source) datasets at DHI that has labelled anomalies that we can use for testing? @ecomodeller @laurafroelich @akfDHI

halvgaard commented 3 years ago

@ecomodeller I found some datasets with labelled anomalies here: https://github.com/numenta/NAB There are very few labels. But I guess that is the case with anomalies.

laurafroelich commented 3 years ago

@rhaDHI Have you checked out the license for that repo? it seems to be quite strict and copy-left, so if we want to use material from the numenta/NAB repo we need to change our license to the same one (AGPL-3.0 License) as far as I can tell. What do you think? If I am right, making our repo AGPL would then imply that anyone using our repo would also have to make it AGPL... maybe not what we want?

ecomodeller commented 3 years ago

I don't know any open datasets at DHI that we can use. We have to ask around and see if someone has some annotated dataset they are willing to share. There are lots of data, but not so many with labels and probably even fewer that are public, unfortunately.

halvgaard commented 3 years ago

I will try to ask around on DHI yammer for labelled data sets with anomalies. @ecomodeller Do you have labels for the DMI data set we have in repo? Otherwise I will try to label the obvious ones with the algorithms, e.g. anomaly 1

halvgaard commented 3 years ago

@laurafroelich @ecomodeller @akfDHI How do you like this message to be posted on yammer:

We are trying to establish best practices and automated ways of identifying anomalies/outliers in time series data. Please let us know if you:

Currently we are working on algorithms based on everything from simple range checks to machine learning models. Check out and potentially contribute to our open source anomaly detection python package on DHI's Github here: https://github.com/DHI/anomalydetection

laurafroelich commented 3 years ago

Sounds good to me :)

ecomodeller commented 3 years ago

Can we make an interactive application to assist the labelling process?

  1. Upload data
  2. Automatic labeling of obvious outliers with simple detector
  3. Manually add / remove labels by clicking on chart.
  4. Save the labelled timeseries in reusable format e.g. csv
akfDHI commented 3 years ago

Sounds good to me too. Which Yammer channel?


From: Laura Froelich notifications@github.com Sent: Friday, 29 January 2021 06.36 To: DHI/anomalydetection anomalydetection@noreply.github.com Cc: Anne Katrine V.Falk akf@dhigroup.com; Mention mention@noreply.github.com Subject: Re: [DHI/anomalydetection] Add benchmarking dataset with labelled anomalies for scoring performance of detector algorithms (#12)

Sounds good to me :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DHI/anomalydetection/issues/12#issuecomment-769586998, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIECKWFY5HSULGVL4ZQVLS3S4JCMXANCNFSM4WKJFNNQ.

halvgaard commented 3 years ago

@ecomodeller There is one open source tool here: https://trainset.geocene.com/

halvgaard commented 3 years ago

@ecomodeller Is this relevant: http://www.marineinsitu.eu/dashboard ?

halvgaard commented 3 years ago

We got a labelled dataset from an actual DHI case based on groundwater measurements. Unfortunately, the dataset cannot be published publicly on github.

ecomodeller commented 1 year ago

Can we make an interactive application to assist the labelling process?

  1. Upload data
  2. Automatic labeling of obvious outliers with simple detector
  3. Manually add / remove labels by clicking on chart.
  4. Save the labelled timeseries in reusable format e.g. csv

Please note that we now have an interactive application for labelling outliers and training a detector.