Another way - Githubissues

HaukeBartsch / outliers

Outlier detection per-pedes

1 stars 0 forks source link

Another way #1

Open gedw99 opened 1 month ago

gedw99 commented 1 month ago

https://github.com/gonum/gonum/

Has

https://github.com/gonum/gonum/blob/master/stat/statmat.go

just a bit easier perhaps .

I also work with Dicons and AI

HaukeBartsch commented 1 month ago

Nice that there is a golang package for this. Didn't know about this one. Should work if you are using covariance matrices (positive definite eigenvalues). I would still suggest to use the regularization to make it more stable.

gedw99 commented 1 month ago

Thanks for the advice @HaukeBartsch

I am still learning about the approaches for this.

I will try out the golang approach . Can you point me to any data sources I can try and so then also benchmark it . Gold included benching tooling

HaukeBartsch commented 1 month ago

In order to benchmark the solution I would start with a problem that is relevant to you? Let's say you have a spreadsheet with lots of rows you want to check. How many of those would you need to check without this algorithm? Compute the ranking and see if the results make sense. You could of course also benchmark things like memory and compute time, stability etc.. As an example for applying this and similar approaches you can look at the ENIGMA (enigma.ini.usc.edu/) projects protocols for data QC. Don't expect too much, there are plenty of examples, especially if you have a low number of records where you see 'variable' results.

gedw99 commented 1 month ago

thanks @HaukeBartsch

Yes a fake data benchmark makes sense to get going. One that you have some ground truth / answer to.

the Enigma Protocol look useful

https://enigma.ini.usc.edu/protocols/imaging-protocols/

https://github.com/orgs/ENIGMA-git/repositories

BTW did you see the Mahalanobis fund this ? https://github.com/gonum/gonum/blob/master/stat/statmat.go#L133C6-L133C17