10XGenomics / subset-bam

MIT License
66 stars 10 forks source link

Is there a way to run subset-bam faster #34

Open mattli7 opened 1 year ago

mattli7 commented 1 year ago

Hello,

I am trying to subset my bam file for each barcode. I have around 20k cells and each of their barcode is in a directory. I have been using the code below to execute subset-bam. It takes around 35 minutes per barcode. I was wondering if there is a way to make subset-bam run any faster, perhaps parallelization?

FILES="my directory containing every barcode" for file in $FILES do filename=$(basename "$file") filename_no_extension="${filename%%.*}"

subset-bam_linux --bam marked.duplicates.bam --bam-tag CB --cell-barcodes barcodes/$filename --out-bam barcode_bams/$filename_no_extension.bam done

ghuls commented 7 months ago

See: https://github.com/10XGenomics/subset-bam/issues/17#issuecomment-1917084971