Closed sjfleck closed 2 years ago
Yes, just pulling out the mRNA lines works great. Just ensure that the names of the genes match exactly the gene names in the fasta file. I point this out because you'll notice that the Parent field for the mRNA line doesn't end in ".1", while Parent field all the other lines do.
Thanks! They do match. It appears in my protein fasta like this:
Calam.S003580.1 MASEELQGSNLQNQAQPPAPVPTTLPQYPEMILIAIEALNEKNDSNKSSISKHIEATYGN LPPAHSTLLTHHLNRMKSIDQLYFIKNNYLKLDPNAPSRRGRGRPPKPKTSLPPGTVLLP PCSRGRPPKSHNPIAPRPPLPTKPKATTTAATVSGKKHGRPSKAATPSVTSTPPPAAGGV PRGRGRPPKVKPAVTASVGA*
Hello, I'm interested in using pSONIC on my data. I'm currently editing my gff3 files to match the "Sp## GeneID Start_POS End_POS" format.
For example, in one of the gff3s, I have 7 lines with the same parent ID. Here are just 3 lines: scaffold_115 MSU_v1 mRNA 11041 12267 . - . ID=Calam.S003580.1;Name=Calam.S003580.1;Parent=Calam.S003580 scaffold_115 MSU_v1 five_prime_UTR 12212 12267 . - . Parent=Calam.S003580.1 scaffold_115 MSU_v1 exon 12122 12267 . - . Parent=Calam.S003580.1 scaffold_115 MSU_v1 CDS 12122 12211 . - 0 Parent=Calam.S003580.1 scaffold_115 MSU_v1 exon 11041 11761 . - . Parent=Calam.S003580.1 scaffold_115 MSU_v1 CDS 11249 11761 . - 0 Parent=Calam.S003580.1 scaffold_115 MSU_v1 three_prime_UTR 11041 11248 . - . Parent=Calam.S003580.1
These features fall under the same parentID. I noticed that the sample gff files that you provided don't have multiple lines with the same ID. Should I just extract the mRNA lines and edit it from there? I would compress those 7 lines down to this single line:
Ca115 Calam.S003580.1 11041 12267
Is this what you expected? thanks!