JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
112 stars 13 forks source link

Fixing Restrict Corpus #65

Closed alyosama closed 2 weeks ago

alyosama commented 2 weeks ago

Hi Jean,

After further investigation, I found the cause of this code failing with Nanostring DSP data.

The issue arises because this type of data has a high proportion of important genes present in all cells (or spots), and a relatively low number of spots (around 59).

In my Python code, I use scanpy with the following filter:

sc.pp.filter_genes(adata_spatial, max_cells=int(removeAbove * len(adata_spatial)), inplace=True)

This approach does not discard genes if they meet the removeAbove threshold.

In your function, however, the "greater than or equal" condition discards genes in this edge case. To address this, I modified the function so that removeAbove=1 and removeBelow=0 will not remove any genes

Let me know if you agree with this approach. If you’re okay with it, feel free to merge!

Best, Aly

alyosama commented 2 weeks ago

Hi Jean,

It seems that the recent change caused all test cases to fail, as it alters the number of output genes in the MOB dataset. I understand that implementing this change would require extensive updates, so I'll go ahead and close the pull request.

JEFworks commented 2 weeks ago

Dear Aly,

Great investigation! I'm glad you found the source of the issue for the Nanostring DSP data.

However, in this case, we actually do like using the >= for removing genes present in 1 (ie. 100%) of spots. It wouldn't make sense to use the threshold to only remove genes in >100% of spots, which would be none. If we wanted to avoid filtering out genes present in all spots (due to a low number of spots for example), we would use removeAbove=Inf. I'm sure there is an equivalent in Python.

Best, Jean