Closed alyosama closed 2 weeks ago
Hi Jean,
It seems that the recent change caused all test cases to fail, as it alters the number of output genes in the MOB dataset. I understand that implementing this change would require extensive updates, so I'll go ahead and close the pull request.
Dear Aly,
Great investigation! I'm glad you found the source of the issue for the Nanostring DSP data.
However, in this case, we actually do like using the >=
for removing genes present in 1 (ie. 100%) of spots. It wouldn't make sense to use the threshold to only remove genes in >100% of spots, which would be none. If we wanted to avoid filtering out genes present in all spots (due to a low number of spots for example), we would use removeAbove=Inf
. I'm sure there is an equivalent in Python.
Best, Jean
Hi Jean,
After further investigation, I found the cause of this code failing with Nanostring DSP data.
The issue arises because this type of data has a high proportion of important genes present in all cells (or spots), and a relatively low number of spots (around 59).
In my Python code, I use
scanpy
with the following filter:This approach does not discard genes if they meet the
removeAbove
threshold.In your function, however, the "greater than or equal" condition discards genes in this edge case. To address this, I modified the function so that
removeAbove=1
andremoveBelow=0
will not remove any genesLet me know if you agree with this approach. If you’re okay with it, feel free to merge!
Best, Aly