Closed Veldem closed 9 years ago
Without a link to the actual files you are working with, I cannot be sure what the issue is. However, I have a good guess. In most GFF3 files, intron features are not explicitly defined. Rather, the exon sequences are provided, and most scientists just understand that there is an intron between every pair of exons.
However, xtractore
does not know this. So if you want xtractore
to extract intron sequences, you have to provide a GFF3 file that has introns explicitly defined.
If your GFF3 file indeed has no intron features, I would recommend using the canon-gff3
program, also from AEGeAn, to create a new GFF3 file with these features. Please let me know if this solves your problem.
Hi Standage, As you indicated, after taking a glance at it, my ".gff3" file has no intron features (Actually, the ".gff3 files" and their corresponding ".fa" files were downloaded from Ensemble FTP browser). Unfortunately, by using "canon-gff" option, ı could not create the new gff including intron features.
Usage: canon-gff3 [options] gff3file1 [gff3file2 ...]
Options:
-h|--help print this help message and exit
-i|--infer for transcript features lacking an explicitly
declared gene feature as a parent, create this
feature on-they-fly
-o|--outfile: STRING name of file to which GFF3 data will be
written; default is terminal (stdout)
-s|--source: STRING reset the source of each feature to the given
value
-v|--version print version number and exit
Although I read the usage of it, still I could not get the new gff. If you give me an example usage how to use it, it will be great. Many thanks for all your help and quick response. -vahap-
I'm not sure what is unclear about the instructions provided by the usage statement. Did you actually run canon-gff3
on a GFF3 file? What was the output? Did it print a bunch of GFF3 data to the terminal, or did it print a warning/error message? I won't be able to help much without more details.
Hi Standage, Unfortunately, I could not get the ".gff3" file including introns using "canon-gff3" option. Here is the code:
home/Software/AEGeAn-0.14.1/bin/canon-gff3 Oryza_sativa.IRGSP-1.0.29.gff3 -o new_out
it gives an error list;
warning: found no valid mRNAs for gene 'gene1'
.
.
warning: found no valid mRNAs for gene 'gene35679'
and gives a non-sense output file. I thing my code or ".gff3" files is wrong! Yet, i can not solve the problem. Thank you again,
Perhaps your GFF3 file uses transcript
features instead of the more common canonical mRNA
features? That would cause canon-gff3
to fail, and would produce the types of warning messages you're describing.
Try the following commands.
# Count the number of `mRNA` features
grep -c $'\tmRNA\t' Oryza_sativa.IRGSP-1.0.29.gff3
# Count the number of `transcript` features
grep -c $'\ttranscript\t' Oryza_sativa.IRGSP-1.0.29.gff3
If the number of mRNA features is 0 and the number of transcript features is greater than 0, then you need to convert the transcript features into mRNA features. The following command would work.
sed $'s:\ttranscript\t:\tmRNA\t:g' Oryza_sativa.IRGSP-1.0.29.gff3 > Oryza_sativa.mrnas.gff3
Hi Stange, As you indicated, I changed transcript name with mRNA using your code, "canon-gff" option worked and successfully extracted intronic sequences. To verify the result, I also applied a python script to parse ".gff3" fies (script name: extract_intron_gff3_from_gff3.py, obtained from this link; https://github.com/irusri/Extract-intron-from-gff3, following this command; awk '/intron/{print}' out.gff3_introns.gff3 | sort -k 1,1 -k4,2n) Both of analysis gave approximately same results; 1) canon-gff result: 147,696 intronic sequence 2) extract_intron_gff3_from_gff3.py: 148,850 intronic sequence I think that I can use your tool, many thanks again for all your help ! -vahap-
Great!
Dear Standage,
Many thanks for introducing AEGeAn tool to us for genom editing. Now, I am using "xtractore" option for extracting exon, intron and CDS. Exon and CDS sequences were seamlessly extracted using these codes:
But, it does not work for intronic sequences. Many thanks for your help -vahap-