david-cortes / outliertree

(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
http://outliertree.readthedocs.io
GNU General Public License v3.0
56 stars 4 forks source link

Not detecting any outliers on purely categorical datasets #4

Open ToxicFyre opened 3 years ago

ToxicFyre commented 3 years ago

outliertree_test.zip

Hello, I am comparing some outlier detectors on purely categorical datasets, but whenever I run OutlierTree on purely categorical datasets it doesn't return any outliers (With some exceptions). Is there any different parameterization that you recommend?

I attached a sample of what I am using to test it, so you can see what I mean.

Cheers and thanks for the help, ToxicFyre

david-cortes commented 3 years ago

Indeed, if you look at the parameters, it has something about minimum sizes of branches to split. Apart from that, categorical outliers are just harder to identify. Ideas for criteria are welcome.

ToxicFyre commented 3 years ago

Thank you for your reply. Yes, I noticed that having a few ordinal and numerical columns really helps in this case, to provide additional information. I'll be sure to share any ideas for criteria that I have in the future.

Thank you for your time, ToxicFyre