IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

Proposal for streamsx.anomalyDetection #72

Closed engebret closed 8 years ago

engebret commented 8 years ago

I'm proposing streamsx.anomalyDetection as a new repository for functions related to anomaly detection algorithms. The initial code would be based on the Local Outlier Function (LOF) which does cluster density based scoring. Additional related code could include KOAD as well as additional algorithms being developed by IBM research teams. The code would include the core algorithms, some basic feature extractors as well as examples for how that code is used in applications such as network anomaly detection.

I had some discussions with the research teams on this proposal, and the view is this repository is a good approach as none of the existing ones are quite right. While there are similarities with the Streams built in time-series toolkit (not available on github), the anomaly detection algorithms do not necessarily always have a time component so that direction is not an ideal match. The only other github toolkit that has similar algorithms is SparkML lib, but again that is not a good match.

mikespicer commented 8 years ago

+1 To aid discovery of toolkits which have other forms of anomaly detection we could include references to them in the documentation for this toolkit.

cancilla commented 8 years ago

+1

leongor commented 8 years ago

+1

2015-10-15 1:06 GMT+03:00 Mike Spicer notifications@github.com:

+1 To aid discovery of toolkits which have other forms of anomaly detection we could include references to them in the documentation for this toolkit.

— Reply to this email directly or view it on GitHub https://github.com/IBMStreams/administration/issues/72#issuecomment-148217073 .

Best regards, Leonid Gorelik.

ddebrunner commented 8 years ago

Will the source be provided for the actual algorithms (e.g. LOF) or just a binary library which is called by SPL operators/functions?

chanskw commented 8 years ago

+1, who are the initial committers to this project?

engebret commented 8 years ago

For source, the initial case of the LOF algorithm will probably point to an OSS (BSD) library (FLANN) that it leverages, plus include an object code build of the library that has been significantly optimized for the applications we have analyzed when running this in Streams. The rest of the code will include the source. For other algorithms added to this repository we will decide on the source question at the time they produced. My expectation is that when they are based on published work, we should provide source so it is the most useful for customers to adapt.

For committers, I would be the first for LOF. After that, we will decide as examples come up.

ddebrunner commented 8 years ago

plus include an object code build of the library that has been significantly optimized

Can you expand on this, is this a rewrite of the FLANN, if so will the source be available for that?

ddebrunner commented 8 years ago

Just as an FYI I found this by accident today:

http://numenta.com/blog/nab-a-benchmark-for-streaming-anomaly-detection.html

engebret commented 8 years ago

It is not a rewrite of FLANN, but is a set of relatively small changes that significantly improve performance for the data sets we have been using. We do not intend to provide that source in this repository, just the object file build. Its a separate issue whether we submit these changes to the FLANN project. We may do that in the future, but at this time the priority is to make a object code version available so clients depending on this code have the high performance version available.

chanskw commented 8 years ago

Created repository. @engebret is added as committer. Please let me know if there should be other committers.

Thanks!