caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
98 stars 25 forks source link

initial commit #39

Closed bobermayer closed 3 years ago

bobermayer commented 3 years ago

to remove background counts, run mgatk using many (~20k) cells, e.g., like so

sort -rgk 17 -t ',' cellranger_output/outs/singlecell.csv | awk -F "\"*,\"*" '$7  > 0 {print}' | head -n 20000 | cut -f 1 -d ',' > top20k_barcodes.tsv
mgatk bcall -i cellranger_output/outs/possorted_bam.bam -n sample_id -o mgatk -c 8 -bt CB -b top20k_barcodes.tsv  --nsamples 1000

the first line selects cells with nonzero mitochondrial counts. to make this work, I replaced os.popen('ls ' + ... by glob.glob in mgatk.

then run

mgatk remove-background -i cellranger_output/outs/possorted_bam.bam -n sample_id -o mgatk -c 1 -bt CB -b top20k_barcodes.tsv --nsamples 1000 -nfg 1031 -nbg 20000 -z

where nfg is the number of "real" ("foreground") cells.

this will first convert the mgatk output into CellBender compatible input, using a couple of R functions from Signac to perform feature selection (simply copied here to avoid having to install the entire package). then it will run CellBender itself (GPU support not necessary if we only have a few hundred "genes"), and finally it will convert cellbender output back into a .rds object (but at this point not the raw sample_id.A.txt.gz files etc.)

additional dependencies:

caleblareau commented 3 years ago

Thanks @bobermayer -- this PR looks great.