Almost all features are concentrated in one pixel within the ImageTransformer

martin-jeremy commented 1 year ago

Hello,

I try to use to deepInsight to categorize bulk-RNA seq data as mentioned within the publication of this soft. My issue reffers to this issue ( #2 and #14 ). When I plot the .feature_density_matrix() , my ImageTransformer concentrate lot of features in one same pixels, resulting of a all purple map with one very bright pixel.

How can I deal with that to have a better repartition of my features with the ImageTransformer ? There is a way to tuned the ImageTransformer ?

Below an example of one generated precedently.

Thank's a lot !

kaboroevich commented 1 year ago

This is caused by the feature_extractor chosen to transform the data. Changing the parameters of extractor or selecting a different method could resolve this via trial and error. It's better to provide a class instance to the feature_extractor parameter than rely on the legacy default, string selected, options. Please see this pull request or either example. If you are currently using t-SNE, you could try UMAP with a large min_dist value. UMAP is used as the feature_extractor in the feature selection example.

Another option is to set discretization="assignment". This will set assign every feature to its own pixel when the number of pixels is greater than the number of features. However, the running time of the method used can range from under a minute to hours depending on the distribution of the data. It's worth trying, but if you find it's taking longer than a few minutes, it may not finish for a very long time. If the number of features is greater than the number of pixels, it will first perform k-means clustering which will take even additional time.

martin-jeremy commented 1 year ago

Ok. I will give a try of this different solution.

I have already tried to run UMAP, but the code in the feature selection example dosen't run. Python dosen't find the module umap.umap_ , even after I run python -m pip install umap.

I am giving a try with discretization='assignment' but it's taking long time, even if there are more pixels than features ( IT size is 255 x 255, so 65025 pixels and I have 23614 genes).

I will also give a try playing with perplexity. I will post here if results are better.

Please can you help me with the UMAP approach ? I think it could be a good idea, but I don't uderstand the error with import umap.umap_ as umap ? Any advice ?

Thank you for your help !

kaboroevich commented 1 year ago

I have already tried to run UMAP, but the code in the feature selection example dosen't run. Python dosen't find the module umap.umap_ , even after I run python -m pip install umap.

The package linked in my comment and used in the example is 'umap-learn'. I think you may have to uninstall the 'umap' package before installing - pip uninstall umap; pip install umap-learn.

I am giving a try with discretization='assignment' but it's taking long time

Unfortunately, the assignment problem solution method used can have very different run times depending on the input and it's not practical in some cases. However, as this is based on the feature_extractor output and not the original data, it is worth trying again with different feature_extractor parameters if your final goal is to have each feature mapped to a unique pixel.

I will also give a try playing with perplexity.

I would also suggest trying different distance metrics (metric). I've also had some success toggling between init="random" and init='pca'.

alok-ai-lab / pyDeepInsight

Almost all features are concentrated in one pixel within the ImageTransformer #27