broadinstitute / wot

A software package for analyzing snapshots of developmental processes
https://broadinstitute.github.io/wot/
BSD 3-Clause "New" or "Revised" License
140 stars 34 forks source link

'cells_by_gene_set' - all cell types have the same number of cells #76

Closed mbk0asis closed 4 years ago

mbk0asis commented 4 years ago

Hello!

I'm trying to generate cells sets using the command, 'cells_by_gene_set'.

wot cells_by_gene_set --score gene_set_scores.txt --out test Running the command above successfully generated 'test.gmt', but the number of cells in each categories are the same (188 cells). Apparently, samples were separated by the gene set scores (no overlapping between cell groups).

MEF -   D2-AACTTTCTCTTGGGTA D2-ACACTGAAGATATGCA D2-CAGCAGCTCGTCTGCT
Pluripotency    -   D7_iNSC-ACGATACTCATCTGTT    D7_iNSC-AGAATAGGTCGTGGCT    D7_iNSC-AGGGTGAGTCATTAGC
Epithelial  -   D4-AACTCCCCAATGTAAG D4-CACCACTGTGCTGTAT D4-CCTTCCCAGGCGCTCT
Neural  -   D5-ATAAGAGCAGGAATGC D5-TTTGGTTTCGGAAACG D10_iNSC-CAGCGACGTAATCGTC

I tried again with a subset of gene set score data and obtained the same result.

Can you guess what caused these result?

I installed 'WOT 1.08' on ubuntu 18.04 using pip3 install wot

Thank you!

mbk0asis commented 4 years ago

I think I figured out. I think it's because --quantile 0.99. I have about 18,000 cells therefore the resulting cell sets should be 1% of the total cells which was ~180 cells.