Closed PavelKiryanov closed 9 months ago
Right your file is poorly formatted you have gene that have the same ID on different chromosome, this is not good. (e.g. 1-aminocyclopropane-1-carboxylate_synthase.gene1
)
I will come back to you with a suggested solution
Split your file by sequence ID (e.g. awk '{ file=$1; if( $0 !~ /^#/)print $0 > file}' infile.gff
)
Then process each file with agat_sp_manage_IDs.pl
to provide new IDs (using e.g. the sequence ID as prefix)
Then you can merge all file together with a simple cat
command
Разделите файл по идентификатору последовательности (например.
awk '{ file=$1; if( $0 !~ /^#/)print $0 > file}' infile.gff
) (переходы) Затем обрабатывайте каждый файл с помощьюagat_sp_manage_IDs.pl
для предоставления новых идентификаторов (например, идентификатор последовательности в качестве префикса) Затем вы можете объединить все файлы вместе с простымcat
Командная команда
I've split all the files, as you said. Now I need to do agat_sp_manage_IDs.pl . I launched it like this agat_sp_manage_IDs.pl --gff Bpe_Chr1 --prefix [ -o Bpe_Chr1out.gf ]. Is that right? I ended up with the same errors. Sorry, I'm new to bioinformatics...
Yes run agat_sp_manage_IDs.pl --gff Bpe_Chr1 --prefix chr1 -o Bpe_Chr1out.gff
for chr1
agat_sp_manage_IDs.pl --gff Bpe_Chr2 --prefix chr2 -o Bpe_Chr2out.gff
for chr2
etc...
Then you concatenate all the out files together and it should be fine.
To check you can run agat_sq_stat_basic.pl
on your original file and on your resulting file. You should get similar values.
After all this, I saw this. Could these errors be due to the fact that I have 14 chromosomes and when moving from number 1 to number 11 they repeated? Is it normal for these files to appear?
Parent "nbis-gene-5223" ; coge_fid 1235352966 ; exon "Bpev01.c0717.g0019.exon1" 2562 cases fixed where L3 features have parent feature(s) missing ------------------------------ done in 0 seconds -------------------------------
--------------------------- Check5: l1 linked to l2 ---------------------------- 2700 cases fixed where L2 features have parent features missing ------------------------------ done in 1 seconds -------------------------------
--------------------------- Check6: remove orphan l1 --------------------------- We remove only those not supposed to be orphan 2661 cases removed where L1 features do not have children (while they are suposed to have children). ------------------------------ done in 0 seconds -------------------------------
------------------------- Check7: all level3 locations ------------------------- ------------------------------ done in 7 seconds -------------------------------
------------------------------ Check8: check cds ------------------------------- No problem found ------------------------------ done in 0 seconds -------------------------------
----------------------------- Check9: check exons ------------------------------ No exons created No exons locations modified No supernumerary exons removed 183 level2 locations modified ------------------------------ done in 4 seconds -------------------------------
----------------------------- Check10: check utrs ------------------------------ 49 UTRs created that were missing No UTRs locations modified No supernumerary UTRs removed ------------------------------ done in 3 seconds -------------------------------
------------------------ Check11: all level2 locations ------------------------- No problem found ------------------------------ done in 4 seconds -------------------------------
------------------------ Check12: all level1 locations ------------------------- We fixed 387 wrong level1 location cases ------------------------------ done in 1 seconds -------------------------------
---------------------- Check13: remove identical isoforms ---------------------- Lets remove isoform nbis-mrna-1490 Lets remove isoform nbis-mrna-17 Lets remove isoform nbis-mrna-1860 Lets remove isoform nbis-mrna-2169 Lets remove isoform nbis-mrna-2517 Lets remove isoform nbis-mrna-1009 Lets remove isoform nbis-mrna-864 Lets remove isoform nbis-mrna-1048 Lets remove isoform nbis-mrna-2015 Lets remove isoform nbis-mrna-2075 Lets remove isoform nbis-mrna-1428 Lets remove isoform nbis-mrna-255 Lets remove isoform nbis-mrna-525 Lets remove isoform nbis-mrna-1588 Lets remove isoform nbis-mrna-960 Lets remove isoform nbis-mrna-1342 Lets remove isoform nbis-mrna-1547 Lets remove isoform nbis-mrna-1326 Lets remove isoform nbis-mrna-872 Lets remove isoform nbis-mrna-2553 Lets remove isoform nbis-mrna-2189 Lets remove isoform nbis-mrna-1864 Lets remove isoform nbis-mrna-1070 Lets remove isoform nbis-mrna-2419 Lets remove isoform nbis-mrna-884 Lets remove isoform nbis-mrna-1050 Lets remove isoform nbis-mrna-1046 Lets remove isoform nbis-mrna-1747 Lets remove isoform nbis-mrna-2390 Lets remove isoform nbis-mrna-165 Lets remove isoform nbis-mrna-1753 Lets remove isoform nbis-mrna-2492 Lets remove isoform nbis-mrna-30 33 identical isoforms removed ------------------------------ done in 0 seconds ------------------------------- ------ End checks (done in 20 second) ----
my original file information
Type (3rd column) Number Size total (kb) Size mean (bp) /!\Results are rounding to two decimal places cds 129416 30405.40 234.94 exon 136591 37081.36 271.48 five_prime_utr 14165 2100.04 148.26 gene 24698 303892.67 12304.34 mrna 25768 152394.22 5914.09 three_prime_utr 13743 3908.60 284.41 Total 344381 529782.29 1538.36
and my resulting file
Type (3rd column) Number Size total (kb) Size mean (bp) /!\Results are rounding to two decimal places
cds 129775 30472.42 234.81 exon 154059 40794.15 264.80 gene 24625 100509.67 4081.61 mrna 25775 125992.66 4888.17 Total 334234 297768.91 890.90
Apparently the quantity has changed. This is bad?
I guess you mixed up original file and resulting file, because there was no UTR in your original file, and AGAT added them. Anyway I would say that the result sounds good now excepted you loosed few specific features e.g. non_canonical_five_prime_splice_site, stop_codon_read_through, etc. I do not know if you really need them but if you want to keep the you should follow those isntruction: https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account
You may also write to CoGe to tell them it is not normal that Unique identifier are not unique and use multiple time (for each chromosome the ID are reset, and re-used)
I solved my problem with this file. Thank you very much! And thank you for teaching me bioinformatics!
Hi, I need to convert GFF to GTF file from here: https://genomevolution.org/CoGe/GenomeInfo.pl?gid=35080 After running agat_convert_sp_gff2gtf.pl --gff myfile.gff -o outmyfile.gtf I get a lot of errors with the following entry:
Can you tell why this is happening? Perhaps I'm not creating the GFF file correctly in this menu?
Is it possible to somehow correct a series of errors? Thank you for your work! I really hope you can help!