DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 112 forks source link

Allow reporting one of the sam tags in `hisat-3n-table` #337

Closed vivekbhr closed 2 years ago

vivekbhr commented 2 years ago

Hi @DaehwanKimLab20191011

This tool looks really great as a replacement to bismark and others! I am looking at single-cell BS-seq data and I am appending the BC/CB tag in the alignments. Would you consider reporting the value of a user-specified SAM tag with hisat-3n-table (as another column) so I can get the corresponding cell barcode tag in the output table?

Thanks! Vivek

imzhangyun commented 2 years ago

Hello Vivek,

Thank you for usingHISAT-3N. Unfortunately, we cannot support user-specified SAM tag for HISAT-3N-Table, because it could take huge disk space. If there are 100 cells mapped to one genomic location, we need to append 100 barcodes to the new column and the new output table file could be 100x bigger than the original table file.

Best, Leo

vivekbhr commented 2 years ago

Hi Leo Thanks for the quick reply. I see that could be an issue. So is there any efficient solution to this? Or would you rather suggest splitting the BAM files by cell barcode before running the hisat-3n-table command (that would be a mess but would work for now)?

imzhangyun commented 2 years ago

Hello Vivek,

I believe process the cell/barcode one by one is a good idea. Here is my suggestion :

  1. Align the reads file (fastq/fasta) by HISAT-3Nand get a full SAM/BAM file.
  2. Add CB/BC tags to the SAM/BAM file.
  3. Write a loop and process one barcode each time. -use samtools to filter the full SAM/BAM file by BC/CB tag -> pipe the result to hisat-3n-table -> pipe the table file to your downstream analysis software/pipeline.

This analysis process won't run very fast, but it can prevent generating too many files in your disk. If hisat-3n-table generate 100k table file (for 100k cells) in the disk, it could be hard to manage.

Best, Leo

vivekbhr commented 2 years ago

This sounds good. Thanks Leo! :+1: