alok-ai-lab / pyDeepInsight

A python implementation of the DeepInsight methodology.
GNU General Public License v3.0
157 stars 47 forks source link

DeepFeature Implementation Question #30

Closed troks27 closed 1 year ago

troks27 commented 1 year ago

Hello,

First, thank you for developing and maintaining this package. It seems to be super powerful and has very interesting applications in genomics. I had a question regarding the implementation of DeepFeature for a new classification task.

Essentially, I want to use DeepFeature to identify unique pathways based on a score they are given in any number of conditions, such as control stim 1, stim 2, stim 3, etc. The pathway data for each of the experimental conditions is in tabular form, and would have 4 variables which would be used by TSNE to do the dimensionality reduction and conversion from tabular to image data. Essentially, each experimental condition would have 4 variables which would need to be converted to TSNE, so the army would have to be [condition x features x row name].

Basically, I have a training data issue. There are scRNAseq datasets available with tens of samples per condition. If I were to generate my tabular pathway data for a scRNAseq that is publicly available with this high sample number, then used this as my training and test data, would this work? Would this be sufficient to generate the necessary hypeparameters and prevent overfitting?

Then, I would like to be able to give the algorithm any new dataset, with any number of conditions, and ask it to do feature selection on the given dataset. Ideally after training it with the publicly available data. Would a set up like this be possible to achieve?

In other words, I would like DeepFeature to be able to identify unique features from the tabular input data. However, the dataset that I would like to analyze only has one sample per condition. However, I can get tabular data from scRNAseq studies online that have many samples per condition available. Can I train the CNN on this data with many samples, then apply the CNN to my data with only one sample and have it select features and identify patterns in the samples of N=1?

alok-ai-lab commented 1 year ago

Hi

Thanks for your interest. Perhaps your question is more related to the general usage of CNN nets. However, let me provide some thoughts hereunder:

1) I believe ten samples per class is not so sufficient to train CNNs properly. More samples would help to achieve a better estimate of the model. However, you may try augmenting samples artificially (if can't be found otherwise), such as via using SMOTE or averaging samples belonging to the same class/category.

2) It is possible to train DeepFeature from an external source of data (where a sufficiently large number of samples exist). Once you train the model using this external data, then it is possible to use your data (as a test set) to find class-labels and also to find activations which would eventually lead to finding pathways. It is also possible to find class activation for each sample which would give some ideas of pathways for individual samples.