I have one question about the configuration parameters and a problem regarding some missing genes/transcripts in the final FLAMES output and would really appreciate some help.
i) First, I was wondering if there is any further explanation for the different isoform parameters that can be adapted in the config file? I have an idea about some of the parameters (MAX_DIS, MAX_TS_DIST, Min_sup_cnt, strand_specific) but I would really appreciate a bit more detail about how the others impact the isoform identification step.
ii) Moreover, I noticed that some of the chromosomes/regions I was providing in the gene annotation reference were not part of the final FLAMES output. I'm using a slightly adapted gtf and fasta file that doesn't only contain human genes but also some pathogens. However, even though reads map against those genes, not a single transcript isoform for those genes is written into the isoform_annotated.gff3 and transcript_assembly.fa. Also, no mitochondrial transcripts are detected.
I checked the number of reads mapping to those regions in the align2genome.bam with samtools idxstats align2genome.bam and at least for the mitochondrial genes, a lot of reads are mapping.
However, only those seqnames are included in the isoform_annotated.gff3: ['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '3', '4', '5', '6', '7', '8', '9', 'GL000191.1', 'GL000192.1', 'GL000194.1', 'GL000195.1', 'GL000218.1', 'GL000219.1', 'GL000223.1', 'X', 'Y']
Are they filtered out due to the parameters specified in the configuration or is something else happening here? It would be great to have information about those genes and transcripts as well.
Hi,
first, thanks a lot for developing FLAMES!
I have one question about the configuration parameters and a problem regarding some missing genes/transcripts in the final FLAMES output and would really appreciate some help.
i) First, I was wondering if there is any further explanation for the different isoform parameters that can be adapted in the config file? I have an idea about some of the parameters (MAX_DIS, MAX_TS_DIST, Min_sup_cnt, strand_specific) but I would really appreciate a bit more detail about how the others impact the isoform identification step.
ii) Moreover, I noticed that some of the chromosomes/regions I was providing in the gene annotation reference were not part of the final FLAMES output. I'm using a slightly adapted gtf and fasta file that doesn't only contain human genes but also some pathogens. However, even though reads map against those genes, not a single transcript isoform for those genes is written into the isoform_annotated.gff3 and transcript_assembly.fa. Also, no mitochondrial transcripts are detected. I checked the number of reads mapping to those regions in the align2genome.bam with samtools idxstats align2genome.bam and at least for the mitochondrial genes, a lot of reads are mapping.
However, only those seqnames are included in the isoform_annotated.gff3:
['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '3', '4', '5', '6', '7', '8', '9', 'GL000191.1', 'GL000192.1', 'GL000194.1', 'GL000195.1', 'GL000218.1', 'GL000219.1', 'GL000223.1', 'X', 'Y']
Are they filtered out due to the parameters specified in the configuration or is something else happening here? It would be great to have information about those genes and transcripts as well.
Thanks a lot!
Best, Kristin