I was trying to use FLAMES in a isoform characterization benchmarking study with a single sample but, since I am new with the long-read world, it is not clear to me yet which are the key parameters that I need to consider in the configuration file. After running FLAMES i found my isoform_filtered gff3 file almost empty. This is my output data:
My input parameters and data was:
--gff3 gencode.v40.annotation.gtf (human annotations)
--genomefa GRCh38.primary_assembly.genome.fa. (human reference genome)
--outdir FLAMES_output/
--fq_dir fastq/ (path to my directory containing my unique fastq file)
I am not using any configuration file so FLAMES is applying other parameters by default and I guess this is the main problem for me since it is designed for ONT. So my question would be, which are the best parameters for running an analysis with PacBio files? Which are your recommendations?
Here I paste a config file I used for ONT data so you indicate if this is everything I need to correct or, apart from correcting these parms for PacBio there is extra params to consider.
Hello,
I was trying to use FLAMES in a isoform characterization benchmarking study with a single sample but, since I am new with the long-read world, it is not clear to me yet which are the key parameters that I need to consider in the configuration file. After running FLAMES i found my isoform_filtered gff3 file almost empty. This is my output data:
2444550950 Jul 13 19:28 align2genome.bam 3252184 Jul 13 19:28 align2genome.bam.bai 16 Jul 13 19:43 isoform_annotated.filtered.gff3 15122175 Jul 13 19:32 isoform_annotated.gff3 61 Jul 13 19:43 isoform_FSM_annotation.csv 3534807677 Jul 13 18:11 merged.fastq.gz 59 Jul 13 18:11 pseudo_barcode_annotation.csv 1505577353 Jul 13 19:41 realign2transcript.bam 3221800 Jul 13 19:41 realign2transcript.bam.bai 98666092 Jul 13 19:33 transcript_assembly.fa 2062401 Jul 13 19:33 transcript_assembly.fa.fai 118564 Jul 13 19:42 transcript_count.bad_coverage.csv.gz 186937 Jul 13 19:42 transcript_count.csv.gz 3617886 Jul 13 19:32 tss_tes.bedgraph
My input parameters and data was: --gff3 gencode.v40.annotation.gtf (human annotations) --genomefa GRCh38.primary_assembly.genome.fa. (human reference genome) --outdir FLAMES_output/ --fq_dir fastq/ (path to my directory containing my unique fastq file)
I am not using any configuration file so FLAMES is applying other parameters by default and I guess this is the main problem for me since it is designed for ONT. So my question would be, which are the best parameters for running an analysis with PacBio files? Which are your recommendations?
Here I paste a config file I used for ONT data so you indicate if this is everything I need to correct or, apart from correcting these parms for PacBio there is extra params to consider.
"pipeline_parameters":{ "do_genome_alignment":true, "do_isoform_identification":true, "do_read_realignment":true, "do_transcript_quantification":true }, "global_parameters":{ "generate_raw_isoform":false, "has_UMI":false }, "isoform_parameters":{ "MAX_DIST":10, "MAX_TS_DIST":120, "MAX_SPLICE_MATCH_DIST":10, "min_fl_exon_len":40, "Max_site_per_splice":3, "Min_sup_cnt":10, "Min_cnt_pct":0.001, "Min_sup_pct":0.2, "strand_specific":0, "remove_incomp_reads":5 }, "alignment_parameters":{ "use_junctions":true, "no_flank":false }, "realign_parameters":{ "use_annotation":true }, "transcript_counting":{ "min_tr_coverage":0.3, "min_read_coverage":0.3 } }
Thank you very much for your help in advance and my apologies for such basic question! Best, AP