Closed ssamberkar closed 2 years ago
Hi Sandeep,
Thanks for your interest in using pSONIC!
There is only a need for one entry per gene in the gff as input for pSONIC, so it would be best to extract the mRNA (or the entire gene) entries in your gff file. Just be sure that the gene name in the final gff file corresponds to the gene name given in your .fasta file, because some gff files can have different naming conventions for different types of annotations (e.g. mRNA entries may be "geneID.1", while CDS entries end in "geneID.cds.1"). It doesn't look like the gff example you provided has this problem, but I point it out only in case some of the other gff files you might be using does.
I will close this issue, but feel free to reopen it if you need additional assistance!
All the best, Justin
Hi,
I've a GFF file generated from annotation pipelines which look like this:
`6 NAM mRNA 21136050 21137070 . - . ID=transcript:Osazucena_06g0019330.01;Parent=gene:Osazucena_06g0019330;biotype=protein_coding;transcript_id=Osazucena_06g0019330.01;canonical_transcript=1
6 NAM five_prime_UTR 21136993 21137070 . - . Parent=transcript:Osazucena_06g0019330.01
6 NAM exon 21136050 21136679 . - . Parent=transcript:Osazucena_06g0019330.01;Name=Osazucena_06g0019330.01.exon.1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=Osazucena_06g0019330.01.exon.1;rank=2
6 NAM exon 21136773 21137070 . - . Parent=transcript:Osazucena_06g0019330.01;Name=Osazucena_06g0019330.01.exon.2;ensembl_end_phase=1;ensembl_phase=1;exon_id=Osazucena_06g0019330.01.exon.2;rank=1
6 NAM CDS 21136252 21136679 . - 2 ID=CDS:Osazucena_06g0019330.01;Parent=transcript:Osazucena_06g0019330.01;protein_id=Osazucena_06g0019330.01
6 NAM CDS 21136773 21136992 . - 0 ID=CDS:Osazucena_06g0019330.01;Parent=transcript:Osazucena_06g0019330.01;protein_id=Osazucena_06g0019330.01
6 NAM three_prime_UTR 21136050 21136251 . - . Parent=transcript:Osazucena_06g0019330.01`
Do I extract only the mRNA or CDS for the MCScanX step?
Please let me know,
Best, Sandeep