Open yangao07 opened 4 years ago
It depends on the format of the cell barcode annotation. if it is a 10x barcode annotation from the filtered_feature_bc_matrix
folder then it does not assume the file contains header but if it is more like the scPipe cell barcode annotation then it will assume the first line is the header and skip it.
Nanopore protocol is non-directional so I search for both directions and trim adapter sequence + cellbarcode/UMI at both directions. The program does not search for polyA sequence.
Thanks for your reply!
- from the
filtered_feature_bc_matrix
folder then it does not assume the file contains header
I did notice this difference in the source code. I was using the same format of barcode as in the filtered_feature_bc_matrix
:
AAAGCAACATGACGGA-1
XXXXXXXXXXXXXXXXX-1
...
During multiplexing, the first barcode ' 'AAAGCAACATGACGGA' was outputted to the screen by 'match_cell_barcode' ("first 5 cell barcode:") However, it is always missed in the output fastq file and the header of "transcript_count.csv.gz" after running FLAMES.
This happens on both the FLTseq data and our in-house data.
Hi @LuyiTian , Could you please comment on my error below?
Running command:
/FLAMES/src/bin/match_cell_barcode /projects/$ff/ $ff.stat $ff.demultiplexed.fq.gz unique_barcodes_g10.csv 2 12
Error:
reverse comp flanking end: 87 42
reverse comp flanking end: 13 37
###total read: 30001
###found flanking region: 15139
###found flanking region(rev): 9335
144 @@@@@@ 96
xah.PAF.fastq
XAH.demultiplexed.fq.gz
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 165) > this->size() (which is 44)
Hi Luyi,
The pipeline is great! Thanks for the effort and for sharing it.
I have tried FLAMES on your published data and our own in-house data, and have two questions:
1) For match_cell_barcode, the "output cell barcode statistics file" always miss the first barcode in the "whitelist" file, is this a bug? 2) For single-cell long-read data, when poly-A tail is in the read, match_cell_barcode should search for the barcode and UMI in the suffix instead of the prefix of the read, right? I did find some cases that match_cell_barcode still searched and trimmed the prefix.
Looking forward to your feedback.
Thanks, Yan