Closed seperman closed 1 year ago
I went ahead and used the Titanic dataset. Then added ages 232, 222,199
as not anomalies. I had those numbers repeated and appended to the ages column.
Now it gives age 20000000000
the same score as the age 10
and 232
. Why?
Instead of repeating those new numbers, I turned them into a normal distribution. It still recognizes 100 as an anomaly but not 20000000000:
This is an unsupervised method so it doesn't have any concept of labels. The only things you can do in that regard are adjusting weights as you mention and adjusting fitting parameters to be more suitable for isolating anomalies in your data.
and 3. This software is based on decision trees. You can read more about the algorithm in the references.
Hello, Let's say we have an array of size n. A few items are marked as anomalies that should not have been. How do you recommend refitting the model, so those items are not marked as anomalies in the future? I considered extending the array with X copies of those items and re-training. Is that the right approach? If yes, what is the optimal value for X?
Example (columnar data):
If it matters, here are the parameters I'm using: