Source and Targets in User feedback learning and Transfer learning

nsankar commented 3 years ago

@Vincent-Vercruyssen Thanks for the great research and this package. I have a question on the transfer learning source, targets and transfer learning when using LocIT.

My scenario #1 is that I have a set of anomaly labels predicted by an unsupervised algorithm as X,y (y is predicted as 1 normal or -1 anomaly) . I then pass most of the anomaly classified instances to a domain user who changes the anomaly labels based on a review. If I were to consider user corrected anomalies as (Xs,ys) and If I want to transfer these label changes to X,y through LocIT transfer learning, Is it correct to do the following ?

transfor = LocIT()
transferLearned_model = transfor.fit(Xs, ys, X,y)

Scenario #2 related to the above is , that sometimes the domain user does not provide a feedback on the label on some of the anomalies presented to him at time 't', say {Xi,yi .... Xn,Yn} . So, assuming that the transferLearned_model above will have some domain knowledge, In the absence of some user feedback, I still want to use some existing domain knowledge to get a proper anomaly labels for which I now use the above transferLearned_model created with LocIT. Is this approach correct ?

get_label_when_nouserfeedback = transferLearned_model (Xn)(## Xn are datapoints for which user does not provide feedback on the anomaly labels.)

Would appreciate your inputs. Thanks in advance.

Vincent-Vercruyssen commented 3 years ago

Hey @nsankar, glad you like the package!

Scenario 1: One quick comment, in my packages I use the convention: -1 = normal, 1 = anomaly (just so you don't get unexpected results). I understand your setting, but I think it is not quite a good fit for the LocIT method, because you are not really dealing with two separate datasets with differing data distributions. Rather, you are correcting the labels of the unsupervised classifier based on user feedback. That sounds like iterative rounds of semi-supervised anomaly detection to me. In that case, you are probably better suited using sth like the SSDO algorithm: https://github.com/Vincent-Vercruyssen/anomatools

Scenario 2: I would not use the model that way. LocIT is essentially trying to figure out which instances (labeled or not) from the source domain fit within the target domain distribution. But because you are working with only one dataset, source = target, so it will only find that all instances should be transferred. I suggest using a semi-supervised anomaly detector that is able to gradually incorporate the label information over time (i.e., you retrain the detector when new labels are available).

Hope this helps!

nsankar commented 3 years ago

@Vincent-Vercruyssen Thanks for the quick response. I got the point on transfertools. As you said only when we have two different [domain] data sets, transfer learning make sense. Will check out SSDO.

Vincent-Vercruyssen / transfertools

Source and Targets in User feedback learning and Transfer learning #2