10XGenomics / subset-bam

MIT License
66 stars 10 forks source link

Why subset-bam is not efficient for splitting BAM file based on barcodes #15

Open kulansam opened 2 years ago

kulansam commented 2 years ago

Hi,

Thanks for developing a subset-bam software. I would like to split the BAM file (from cell ranger) for each individual cell barcode, which is provided in the filtered_feature matrix folder (barcode.tsv). I have used the following comment in for loop of my code, but it takes more than 6 days for around 4000K cells in multi-threading.

subset-bam_linux --bam filtered_barcodes_sorted.bam --cell-barcodes $line.tsv --cores 15 --out-bam ./filter_cell_individual_bam/$line.bam

Is there any way to speed up this process?

limin321 commented 2 years ago

Hi,

Thanks for developing a subset-bam software. I would like to split the BAM file (from cell ranger) for each individual cell barcode, which is provided in the filtered_feature matrix folder (barcode.tsv). I have used the following comment in for loop of my code, but it takes more than 6 days for around 4000K cells in multi-threading.

subset-bam_linux --bam filtered_barcodes_sorted.bam --cell-barcodes $line.tsv --cores 15 --out-bam ./filter_cell_individual_bam/$line.bam

Is there any way to speed up this process?

What I did is to split the barcode.tsv into many txt files, each barcode is one file. Then you can set up running as a batch.