LuyiTian / FLAMES

Full-length transcriptome splicing and mutation analysis
GNU General Public License v3.0
69 stars 10 forks source link

run match_cell_barcode, no error, no result, match_cell_barcode /data_RAGE_seq/data1 cell_barcode_stat.txt split_barcode.fastq flame_3M-february-2018.txt 2; split_barcode.fastq is zero,no other file generation。 #12

Open markme123 opened 3 years ago

LuyiTian commented 3 years ago

it is hard to see with limited information. Is there any output in terminal? usually you would see some stats printed after you run the program, here is an example

the first few lines of output:

set UMI length to 10.
First 5 cell barcode:
        AAACCTGCAATCCAAC
        AAACGGGCATACGCCG
        AAACGGGCATTAGGCT
        AAACGGGGTATAGTAG
        AAAGATGCAACACCCG
/stornext/Genomics/data/CLL_venetoclax/FLTseq/HD11/fastq/HD11_pass.fq.gz
forward flanking end: 66        2819
forward flanking end: 67        2486

the last lines:

        24      1117
        32      487
###total read: 56654147
###barcode hm match: 33287709
###barcode match: 3337587
###barcode not match: 20009062
###too short: 19789
yuchen345 commented 2 years ago

Hi, Luyi, Would you please make an example of the usage of match_cell_barcode and explain the input files in more details? Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?

here is my error message: image

Thanks! Yuchen

icanccwhite commented 2 years ago

Hi, Luyi, Would you please make an example of the usage of match_cell_barcode and explain the input files in more details? Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?

here is my error message: image

Thanks! Yuchen

I have the same question

LuyiTian commented 2 years ago

Hi @icanccwhite and @yuchen345 you should use long-read fastq data as input. The cell barcode file come from the short-read data output. from your screenshot it seems you have printed the first 5 cell barcode so the program is running well. Can you check your data path again? I think you need to use absolute path.

yuchen345 commented 2 years ago

Thanks for your reply! @LuyiTian

Here is another error using sc_long_pipeline.py :

### read gene annotation 2022-04-20 20:57:58

remove similar transcripts in gene annotation: Counter({'duplicated_transcripts': 370}) ### find isoforms 2022-04-20 20:59:27 GL000219.1 KI270713.1 KI270733.1 GL000194.1 GL000195.1 KI270731.1 20 Traceback (most recent call last): File "./sc_long_pipeline.py", line 213, in sc_long_pipeline(args) File "./sc_long_pipeline.py", line 179, in sc_long_pipeline raw_gff3=raw_splice_isoform if config_dict["global_parameters"]["generate_raw_isoform"] else None) File "/home/chenz/biosoft/FLAMES/python/sc_longread.py", line 975, in group_bam2isoform it_region = bamfile.fetch(ch, bl.s, bl.e) File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region ValueError: invalid contig 20

Waiting for your reply!

LuyiTian commented 2 years ago

it seems chromosome 20 is not in the pysam dictionary. I would suggest double check your genome annotation and make sure you download the fasta and gff/gtf file from the same source. did you do anything to the genome annotation? usually chromosome 20 wont be in the end of the chromosome list. from your output it seems to be at the end.

yuchen345 commented 2 years ago

Thank you very much ! @LuyiTian

More questions i am wondering:

  1. As you said, the FLAMES searches for both directions and trims adapter sequence + cellbarcode/UMI at both directions, what dose FLAMES do for UMI assignment while a read was tagged with UMI and perhaps there is a sequencing error?

  2. I noticed that there is a find_polyT function in match_cell_barcode, have you omitted polyT sequence in the output fastq.gz file? How do you deal with the polyA sequence at the reverse strand?

  3. Can the FLAMES be used with 5' libraries(10X ) as there is TSO sequence rather than polyT after cellbarcode/UMI?

Looking forward to your reply.

Thanks