atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
64 stars 19 forks source link

Feature Request: p-values for the gene:gene pairs in genepairfinder:find_all() #86

Closed philoel closed 2 years ago

philoel commented 2 years ago

I'm trying to take the list of gene pairs produced with

gene_pairs = gpf.find_all(align_thr=0.05)

, as input for some GO term enrichment work I'm doing separately in R (where I am a bit more comfortable). However, for this we need the p-values or other score that was computed for the gene pairs.

I see in the analysis.py that these are computed and then the results are simply filtered of pairs that don't meet a threshold, but I don't know how to edit the code to leave in the p-values, or some sort of score.

I see that the structure of the gene_pairs object is columns where each column name is the cell-cell pair, and the entries under that are the gene-gene pairs. I guess it would be practical to have a second column for each original column, carrying the p-value, even though this would double the width of the outputted column.

atarashansky commented 2 years ago

pip install samap=1.0.3

Pvalues are now added to the table.

philoel commented 2 years ago

Awesome, thanks for adding. I see there are two columns of p-values, pval1 and pval2. What do these correspond do? I guess maybe raw and adjusted. Can you describe that a bit more?

atarashansky commented 2 years ago

pvals1 = pvalues for genes in species 1, pvals2 = pvalues for genes in species 2,

so for gene pair hu_TOP2B;ms_Top2b --> pvals1 is for hu, pvals2 is for ms