10XGenomics / subset-bam

MIT License
66 stars 10 forks source link

Unable to filter by RE tag #14

Open LiNk-NY opened 2 years ago

LiNk-NY commented 2 years ago

Hi @ifiddes-10x and others,

Thank you for providing this useful tool.

I am trying to filter for intergenic regions, i.e., RE:A:I. This does not seem to work and I have checked my bam file for this information (see below) and it is available. Perhaps I am using incorrect syntax?

Note. I was able to test it with a text file with a single barcode using the --bam-tag CR:Z and that worked well.

Your help is much appreciated!

Thank you. -Marcel

Step 1. Create a file with "I" (the desired value) for my tag of interest

mramos@super ~/gh/subset-bam/target/release (master) $ echo "I" > ~/test/tags.txt
mramos@super ~/gh/subset-bam/target/release (master) $ cat ~/test/tags.txt
I

Step 2. Use subset-bam with the --bam-tag input set to RE:A (similar to CR:Z for barcodes) and the -c argument pointing to the file in Step 1.

mramos@super ~/gh/subset-bam/target/release (master) $ ./subset-bam --bam ~/data/10x/pbmc_granulocyte_sorted_3k_gex_possorted_bam.bam \
>  --bam-tag RE:A -c ~/test/tags.txt --cores 64 \
>  -o ~/data/10x/pbmc_granulocyte_sorted_3k_gex_possorted_bam_tag_RE.bam
01:06:48 [ERROR] Zero alignments were kept. Does your BAM contain the cell barcodes and/or tag you chose?

samtools view the bam file:

mramos@super ~/data/10x $ samtools view pbmc_granulocyte_sorted_3k_gex_possorted_bam.bam | head -1
A00984:207:HGWCKDSXY:3:2164:11369:30076 0       chr1    10014   1       90M     *       0       0       AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCT      F:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,,F:FFFFFFFFFFFFFFFFF,FFFF,,FF:F:,FF,FF,FFF:F,   NH:i:4  HI:i:1  AS:i:86 nM:i:1  RG:Z:pbmc_granulocyte_sorted_3k:0:1:HGWCKDSXY:3 RE:A:I  xf:i:0  CR:Z:AGCATAAGTTAATACT   CY:Z:FFFF:::FF,F:FFF:   UR:Z:TACGATAAATTA       UY:Z:FFFFFF:FF:FF       UB:Z:TACGATAAATTA