Open rmovva opened 4 years ago
Possibly relevant cuML issue: https://github.com/rapidsai/cuml/issues/2478
. Also, I note that while the CPU output is sorted by score (i.e., the top 50 diff. genes have high scores, and are sorted in decreasing order), the GPU output seems to be unsorted, and some of the scores are very low.
~Are you referring to the resulting scores (.e.g, adata.uns['rank_genes_groups']['scores']
)? For me, the output for both CPU and GPU seem to be unsorted.~
Correction: When using penalty='none'
, the major axis is indeed sorted in both the GPU and CPU notebooks.
While we wait for the release of the fix for rapidsai/cuml#2478, we have a couple options:
penalty='none'
into the rank_genes_groups
functions for both CPU and GPU. C
hyper-parameter to the number of elements in X as recommended in rapidsai/cuml#2478.hey folks, just curious. Since the above cuML issue has been fixed, did any of you get a chance rerun the code afterwards? Are you still facing this issue?
@teju85 thanks for the reminder, we'll check and get back to you.
@teju85 We checked this and the problem still exists, despite the cuML bug being resolved. @cjnolet is looking into it.
This issue should be resolved now: https://github.com/rapidsai/cuml/issues/3645
Will test and close.
I tried running the RAPIDS implementation of rank_genes_groups alongside the Scanpy CPU implementation on the same data matrix, but I'm getting very different results.
Here's my code for the GPU call:
And the CPU call:
When I look at the top differential gene for each cluster, the outputs reported by the GPU and CPU are disjoint. Also, I note that while the CPU output is sorted by score (i.e., the top 50 diff. genes have high scores, and are sorted in decreasing order), the GPU output seems to be unsorted, and some of the scores are very low. My suspicion is that the GPU output isn't actually being properly sorted by logistic regression coefficient, so the output is just some random set of differential genes & their scores instead of the top N.
When I scatterplot the results, the CPU results also seem to make much more sense than the GPU.