Closed engebret closed 8 years ago
+1 To aid discovery of toolkits which have other forms of anomaly detection we could include references to them in the documentation for this toolkit.
+1
+1
2015-10-15 1:06 GMT+03:00 Mike Spicer notifications@github.com:
+1 To aid discovery of toolkits which have other forms of anomaly detection we could include references to them in the documentation for this toolkit.
— Reply to this email directly or view it on GitHub https://github.com/IBMStreams/administration/issues/72#issuecomment-148217073 .
Best regards, Leonid Gorelik.
Will the source be provided for the actual algorithms (e.g. LOF) or just a binary library which is called by SPL operators/functions?
+1, who are the initial committers to this project?
For source, the initial case of the LOF algorithm will probably point to an OSS (BSD) library (FLANN) that it leverages, plus include an object code build of the library that has been significantly optimized for the applications we have analyzed when running this in Streams. The rest of the code will include the source. For other algorithms added to this repository we will decide on the source question at the time they produced. My expectation is that when they are based on published work, we should provide source so it is the most useful for customers to adapt.
For committers, I would be the first for LOF. After that, we will decide as examples come up.
plus include an object code build of the library that has been significantly optimized
Can you expand on this, is this a rewrite of the FLANN, if so will the source be available for that?
Just as an FYI I found this by accident today:
http://numenta.com/blog/nab-a-benchmark-for-streaming-anomaly-detection.html
It is not a rewrite of FLANN, but is a set of relatively small changes that significantly improve performance for the data sets we have been using. We do not intend to provide that source in this repository, just the object file build. Its a separate issue whether we submit these changes to the FLANN project. We may do that in the future, but at this time the priority is to make a object code version available so clients depending on this code have the high performance version available.
Created repository. @engebret is added as committer. Please let me know if there should be other committers.
Thanks!
I'm proposing streamsx.anomalyDetection as a new repository for functions related to anomaly detection algorithms. The initial code would be based on the Local Outlier Function (LOF) which does cluster density based scoring. Additional related code could include KOAD as well as additional algorithms being developed by IBM research teams. The code would include the core algorithms, some basic feature extractors as well as examples for how that code is used in applications such as network anomaly detection.
I had some discussions with the research teams on this proposal, and the view is this repository is a good approach as none of the existing ones are quite right. While there are similarities with the Streams built in time-series toolkit (not available on github), the anomaly detection algorithms do not necessarily always have a time component so that direction is not an ideal match. The only other github toolkit that has similar algorithms is SparkML lib, but again that is not a good match.