LuyiTian / FLAMES

Full-length transcriptome splicing and mutation analysis
GNU General Public License v3.0
68 stars 10 forks source link

Demultiplexing issues #10

Open yangao07 opened 3 years ago

yangao07 commented 3 years ago

Hi Luyi,

The pipeline is great! Thanks for the effort and for sharing it.

I have tried FLAMES on your published data and our own in-house data, and have two questions:

1) For match_cell_barcode, the "output cell barcode statistics file" always miss the first barcode in the "whitelist" file, is this a bug? 2) For single-cell long-read data, when poly-A tail is in the read, match_cell_barcode should search for the barcode and UMI in the suffix instead of the prefix of the read, right? I did find some cases that match_cell_barcode still searched and trimmed the prefix.

Looking forward to your feedback.

Thanks, Yan

LuyiTian commented 3 years ago
  1. It depends on the format of the cell barcode annotation. if it is a 10x barcode annotation from the filtered_feature_bc_matrix folder then it does not assume the file contains header but if it is more like the scPipe cell barcode annotation then it will assume the first line is the header and skip it.

  2. Nanopore protocol is non-directional so I search for both directions and trim adapter sequence + cellbarcode/UMI at both directions. The program does not search for polyA sequence.

yangao07 commented 3 years ago

Thanks for your reply!

  1. from the filtered_feature_bc_matrix folder then it does not assume the file contains header

I did notice this difference in the source code. I was using the same format of barcode as in the filtered_feature_bc_matrix: AAAGCAACATGACGGA-1 XXXXXXXXXXXXXXXXX-1 ...

During multiplexing, the first barcode ' 'AAAGCAACATGACGGA' was outputted to the screen by 'match_cell_barcode' ("first 5 cell barcode:") However, it is always missed in the output fastq file and the header of "transcript_count.csv.gz" after running FLAMES.

This happens on both the FLTseq data and our in-house data.

aheravi commented 3 years ago

Hi @LuyiTian , Could you please comment on my error below?

Running command:

/FLAMES/src/bin/match_cell_barcode /projects/$ff/ $ff.stat $ff.demultiplexed.fq.gz unique_barcodes_g10.csv 2 12 

Error:

reverse comp flanking end: 87   42
reverse comp flanking end: 13   37
###total read: 30001
###found flanking region: 15139
###found flanking region(rev): 9335
144 @@@@@@ 96
xah.PAF.fastq
XAH.demultiplexed.fq.gz
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 165) > this->size() (which is 44)