NAL-i5K / GFF3toolkit

Python programs for processing GFF3 files
Other
94 stars 27 forks source link

inquiry about "gff3_to_fasta -st user_defined -u mRNA CDS" #124

Open yanzhongsino opened 2 years ago

yanzhongsino commented 2 years ago

Thank you for your development of the great gff3 tools.

While I use "gff3_to_fasta -st user_defined -u mRNA CDS" to get fasta file, it seems to get child (this is CDS) fasta. I'm just wondering about how to get parent fasta (this is mRNA) in this situation.

Any reply will be welcome.

mpoelchau commented 2 years ago

Hi @yanzhongsino. How is the mRNA modeled? If there are exon features in addition to CDS, I'd recommend using gff3_to_fasta -st user_defined -u mRNA exon. But that really depends on how the data is modeled in your gff3 file. You can send along a snippet if that helps.

yanzhongsino commented 2 years ago

Thanks! There are only mRNA and CDS features in the gff3 file which was downloaded from published paper. This is a snippet. " LG01 maker mRNA 883182 884411 . - . ID=DR000001;Source=MAKER1:DHR000398.1,MAKER2:DH000416.1; LG01 maker CDS 884253 884411 . - 0 Parent=DR000001; LG01 maker CDS 883311 883392 . - 0 Parent=DR000001; LG01 maker CDS 883182 883219 . - 2 Parent=DR000001; LG01 maker mRNA 884947 886421 . + . ID=DR000002; LG01 maker CDS 884947 885114 . + 0 Parent=DR000002; LG01 maker CDS 885378 885476 . + 0 Parent=DR000002; LG01 maker CDS 885572 885723 . + 0 Parent=DR000002; LG01 maker CDS 886034 886421 . + 1 Parent=DR000002; "

I find a way to get parent fasta. First, I change all mRNA tag to gene in gff3 file. Then, I used "gff3_to_fasta -st gene" to get "gene" fasta(this is mRNA in old gff file). If there are any other smooth way, please tell me.

mpoelchau commented 2 years ago

@yanzhongsino when you say that you only got the CDS fasta at first - do you mean that you only got the sequences for the individual CDS segments (e.g. in the above example for DR000001, you would get 3 separate fastas for each individual CDS line)? Or did you get one fasta sequence for the entire CDS?

yanzhongsino commented 2 years ago

I got one fasta sequence for the entire CDS in the first case.

mpoelchau commented 2 years ago

Thanks for the response @yanzhongsino (and sorry about the delay). I'm curious - are the nucleotide sequences that you got from the different commands this same (mRNA vs. CDS)?