calico / scnym

Semi-supervised adversarial neural networks for classification of single cell transcriptomics data
https://scnym.research.calicolabs.com
Apache License 2.0
74 stars 11 forks source link

Multiple categories #7

Closed chsher closed 2 years ago

chsher commented 4 years ago

Is there functionality to incorporate multiple categories, in addition to domain? For instance, I'd also like the adversary to classify other annotations such as patient and sequencer.

jacobkimmel commented 4 years ago

Thanks for your interest in scNym!

Is there functionality to incorporate multiple categories, in addition to domain?

To clarify my understanding: you have multiple independent categorical variables you'd like to adapt (e.g. patients and sequencers occur in various combinations).

If the variables are largely dependent on one another (e.g. patient X is almost always matched to sequencer Y), you can dummy-code all possible (patient, sequencer) combinations as a "domain". We've seen success with this approach in the most recent version of our pre-print where we train across multiple domains representing different single cell preparation methods.

The downside of this approach is that the adversary won't be able to exploit similarities between dummy classes to learn a more performant classifier (e.g. if dummy code 0 (patient 1, sequencer X), dummy code 1 (patient 2, sequencer X), the adversary can't exploit similarities between code 0 and 1).

If the variables are largely independent, the best way to handle this would be to train a multi-task domain adversary, performing each classification problem separately. This is not supported in our current implementation, but it's a straightforward extension of the model.

We can consider including this as an additional feature in the future if this is useful.

chsher commented 4 years ago

Thanks for your helpful reply! A multi-task domain adversary would be very useful.

jacobkimmel commented 4 years ago

Of course!

A multi-task domain adversary would be very useful.

Thanks for the feedback. I'll add this to our roadmap, since this seems like a straightforward extension. In the mean-time, I'd be interested to hear how the multi-domain training with dummy-coding works in your case if you give it a shot.

jacobkimmel commented 3 years ago

An API endpoint for multi-domain training as discussed here was added in #9