Open mictadlo opened 4 years ago
Hi Michal,
you can go into the file coyote_tobacco_parameters.cfg and change
print_utr off
to
print_utr on
Best,
Katharina
On Fri, Aug 21, 2020 at 8:10 AM Michał T. Lorenc notifications@github.com wrote:
Hi, I ran HiSat2, MarkDuplicate, removed reads with the lower quality score than 40 and finally only kept properly paired reads. Scallop/StringTie and TransDecoder could predict UTRs but unfortunately, Augustus has not found any:
gff-version 3
This output was generated with AUGUSTUS (version 3.3.3).
AUGUSTUS is a gene prediction tool written by M. Stanke (mario.stanke@uni-greifswald.de),
O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
Sources of extrinsic information: M RM E W
reading in the file split-data//chr15_pilon3.bam.hints.gff ...
Have extrinsic information about 1 sequences (in the specified range).
Initializing the parameters using config directory /work/waterhouse_team/miniconda2/envs/augustus/config/ ...
coyote_tobacco version. Using species specific transition matrix: /work/waterhouse_team/miniconda2/envs/augustus/config/species/coyote_tobacco/coyote_tobacco_trans_shadow_partial_utr.pbl
Looks like split-assembly//chr15_pilon3.fa is in fasta format.
We have hints for 1 sequence and for 1 of the sequences in the input set.
#
----- prediction on sequence number 1 (length = 131193082, name = chr15_pilon3) -----
#
Predicted genes for sequence number 1 on both strands
start gene chr15_pilon3.g1
chr15_pilon3 AUGUSTUS gene 1 1435 0.03 - . ID=chr15_pilon3.g1
chr15_pilon3 AUGUSTUS transcript 1 1435 0.03 - . ID=chr15_pilon3.g1.t1;Parent=chr15_pilon3.g1
chr15_pilon3 AUGUSTUS intron 1 205 0.16 - . Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS intron 475 555 0.52 - . Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS CDS 206 474 0.09 - 1 ID=chr15_pilon3.g1.t1.cds;Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS exon 206 474 . - . Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS CDS 556 1271 0.94 - 0 ID=chr15_pilon3.g1.t1.cds;Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS exon 556 1435 . - . Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS start_codon 1269 1271 . - 0 Parent=chr15_pilon3.g1.t1
chr15_pilon3 AUGUSTUS transcription_start_site 1435 1435 . - . Parent=chr15_pilon3.g1.t1
protein sequence = [MEDEMKSLHENGTYELVNLPKGKRALSNKWIFRIKQDNHTSTPRYKARLVVKGFGQNKGVDFDENFSPVMKMSSIRVV
LGLAVSLDLEVEQMDVKTTFLHTDLVEEIYMEQSEGFVTKRKENYVCKLKKSLYGLKQAPRQWYLKFESVIEEQGYKKTSSDHCVFFQKFSDDDFIIL
LLYVDDMLIVGKNKSRIAILKKQLSKSFAMKDLGPEKKILGIQSTDIEIERSYFYPKSNTLRRAHLLIKKKKEMSRIPYSSVVGSLMYAMVCTRPDIA
HVVGIVSRFLSNPGKKNWDAVKWILRYFKGTADLKLCFGNGKPELVCYTDSDLE]
Evidence for and against this transcript:
% of transcript supported by hints (any source): 0
CDS exons: 0/2
CDS introns: 0/2
5'UTR exons and introns: 0/1
3'UTR exons and introns: 0/0
hint groups fully obeyed: 0
incompatible hint groups: 0
end gene chr15_pilon3.g1
I used the following commands for Augustus
bam2augustus
bam2hints –intronsonly --in=${r1} --out=${r1}.intron-hints.gff
bam2wig $r1 > ${r1}.wig
cat ${r1}.wig | wig2hints.pl --width=10 --margin=10 --minthresh=2 --minscore=4 --prune=0.1 --src=W --type=ep --radius=4.5 --pri=4 --strand="." > ${r1}.ep-hints.gff
cat ${r1}.intron-hints.gff ${r1}.ep-hints.gff > ${r1}.hints.gff
Augustus
cat extrinsic.cfg
[SOURCES]
M RM E W
exonpart 1 .992 M 1 1e+100 RM 1 1 E 1 1 W 1 1.005
intron 1 .34 M 1 1e+100 RM 1 1 E 1 1e5 W 1 1
CDSpart 1 1 0.985 M 1 1e+100 RM 1 1 E 1 1 W 1 1
UTRpart 1 1 0.985 M 1 1e+100 RM 1 1 E 1 1 W 1 1
nonexonpart 1 1 M 1 1e+100 RM 1 1.01 E 1 1 W 1 1
augustus
augustus ${asm} --noInFrameStop=false --genemodel=complete --species=coyote_tobacco --gff3=on --UTR=on --extrinsicCfgFile=$3 --alternatives-from-sampling=false --alternatives-from-evidence=true --uniqueGeneId=true --hintsfile=${hints} --allow_hinted_splicesites=atac > ${2}/${output}.gff3
What did I miss?
Thank you in advance,
Michal
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBTYCVHOCU7V43KALLSBYFT3ANCNFSM4QG55UPA .
Hi Katharina, Is there a reason why it is turn off as default?
Thank you in advance,
Michal
I have no idea... apparently the person who made the parameter set did not want to see UTRs ;-)
UTRs often cause problems during genbank submission... maybe that's why.
On Fri, Aug 21, 2020 at 2:11 PM Michał T. Lorenc notifications@github.com wrote:
Hi Katharina, Is there a reason why it is turn off as default?
Thank you in advance,
Michal
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/184#issuecomment-678258302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCBEE2N3EIYKXIKSYLSBZP6DANCNFSM4QG55UPA .
Hi Katharina,
Thank you, it worked. However, I noticed that I got double five_prime_utr
and three_prime_utr
in a gene. Is it normal and if not then how is it possible to fix it or does gene is not real and has to be removed?
chr04_pilon_pilon_pilon AUGUSTUS gene 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3
chr04_pilon_pilon_pilon AUGUSTUS transcript 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3.t1;Parent=chr04_pilon_pilon_pilon.g3
chr04_pilon_pilon_pilon AUGUSTUS transcription_start_site 23186 23186 . + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 23186 23288 0.32 + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 26417 26509 0.43 + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS start_codon 26510 26512 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS intron 26883 27834 0.09 + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS CDS 26510 26882 0.73 + 0 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS CDS 27835 27974 0.22 + 2 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS stop_codon 27972 27974 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 27975 28133 0.12 + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 36567 37320 0.53 + . Parent=chr04_pilon_pilon_pilon.g3.t1
chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 37320 37320 . + . Parent=chr04_pilon_pilon_pilon.g3.t1
Furthermore, I discovered that a gene has a three_prime_utr
but it is missing five_prime_utr
. Could it be because of --genemodel=partial
?
chr04_pilon_pilon_pilon AUGUSTUS gene 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1
chr04_pilon_pilon_pilon AUGUSTUS transcript 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1.t1;Parent=chr04_pilon_pilon_pilon.g1
chr04_pilon_pilon_pilon AUGUSTUS intron 181 345 0.89 + . Parent=chr04_pilon_pilon_pilon.g1.t1
chr04_pilon_pilon_pilon AUGUSTUS CDS 2 180 0.84 + 1 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1
chr04_pilon_pilon_pilon AUGUSTUS CDS 346 464 0.89 + 2 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1
chr04_pilon_pilon_pilon AUGUSTUS stop_codon 462 464 . + 0 Parent=chr04_pilon_pilon_pilon.g1.t1
chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 465 490 0.27 + . Parent=chr04_pilon_pilon_pilon.g1.t1
chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 490 490 . + . Parent=chr04_pilon_pilon_pilon.g1.t1
Thank you in advance,
Michal
In eukaryotes, genes can generally be spliced. This may not only affect the coding parts of exons, but also the untranslated parts. Your output shows spliced UTR features. They are not duplicates. Best, Katharina
On Mon, Aug 24, 2020 at 1:11 AM Michał T. Lorenc notifications@github.com wrote:
Hi Katharina, Thank you, it worked. However, I noticed that I got double five_prime_utr and three_prime_utr in a gene. Is it normal and if not then how is it possible to fix it or does gene is not real and has to be removed?
chr04_pilon_pilon_pilon AUGUSTUS gene 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3 chr04_pilon_pilon_pilon AUGUSTUS transcript 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3.t1;Parent=chr04_pilon_pilon_pilon.g3 chr04_pilon_pilon_pilon AUGUSTUS transcription_start_site 23186 23186 . + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 23186 23288 0.32 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 26417 26509 0.43 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS start_codon 26510 26512 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS intron 26883 27834 0.09 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 26510 26882 0.73 + 0 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 27835 27974 0.22 + 2 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS stop_codon 27972 27974 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 27975 28133 0.12 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 36567 37320 0.53 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 37320 37320 . + . Parent=chr04_pilon_pilon_pilon.g3.t1
Furthermore, I discovered that a gene has a three_prime_utr but it is missing five_prime_utr. Could it be because of --genemodel=partial?
chr04_pilon_pilon_pilon AUGUSTUS gene 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1 chr04_pilon_pilon_pilon AUGUSTUS transcript 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1.t1;Parent=chr04_pilon_pilon_pilon.g1 chr04_pilon_pilon_pilon AUGUSTUS intron 181 345 0.89 + . Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 2 180 0.84 + 1 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 346 464 0.89 + 2 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS stop_codon 462 464 . + 0 Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 465 490 0.27 + . Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 490 490 . + . Parent=chr04_pilon_pilon_pilon.g1.t1
Thank you in advance,
Michal
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/184#issuecomment-678837025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBYMD7OXX25RVIRRFLSCGOZJANCNFSM4QG55UPA .
Genes must not necessarily be predicted with both 3'- and 5'-UTR features. You particular example shows that there is not much space left hand of the CDS. But you may find similar examples in the middle of larger scaffolds.
On Mon, Aug 24, 2020 at 3:44 PM Katharina Hoff katharina.hoff@gmail.com wrote:
In eukaryotes, genes can generally be spliced. This may not only affect the coding parts of exons, but also the untranslated parts. Your output shows spliced UTR features. They are not duplicates. Best, Katharina
On Mon, Aug 24, 2020 at 1:11 AM Michał T. Lorenc notifications@github.com wrote:
Hi Katharina, Thank you, it worked. However, I noticed that I got double five_prime_utr and three_prime_utr in a gene. Is it normal and if not then how is it possible to fix it or does gene is not real and has to be removed?
chr04_pilon_pilon_pilon AUGUSTUS gene 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3 chr04_pilon_pilon_pilon AUGUSTUS transcript 23186 37320 0.01 + . ID=chr04_pilon_pilon_pilon.g3.t1;Parent=chr04_pilon_pilon_pilon.g3 chr04_pilon_pilon_pilon AUGUSTUS transcription_start_site 23186 23186 . + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 23186 23288 0.32 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS five_prime_utr 26417 26509 0.43 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS start_codon 26510 26512 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS intron 26883 27834 0.09 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 26510 26882 0.73 + 0 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 27835 27974 0.22 + 2 ID=chr04_pilon_pilon_pilon.g3.t1.cds;Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS stop_codon 27972 27974 . + 0 Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 27975 28133 0.12 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 36567 37320 0.53 + . Parent=chr04_pilon_pilon_pilon.g3.t1 chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 37320 37320 . + . Parent=chr04_pilon_pilon_pilon.g3.t1
Furthermore, I discovered that a gene has a three_prime_utr but it is missing five_prime_utr. Could it be because of --genemodel=partial?
chr04_pilon_pilon_pilon AUGUSTUS gene 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1 chr04_pilon_pilon_pilon AUGUSTUS transcript 2 490 0.21 + . ID=chr04_pilon_pilon_pilon.g1.t1;Parent=chr04_pilon_pilon_pilon.g1 chr04_pilon_pilon_pilon AUGUSTUS intron 181 345 0.89 + . Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 2 180 0.84 + 1 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS CDS 346 464 0.89 + 2 ID=chr04_pilon_pilon_pilon.g1.t1.cds;Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS stop_codon 462 464 . + 0 Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS three_prime_utr 465 490 0.27 + . Parent=chr04_pilon_pilon_pilon.g1.t1 chr04_pilon_pilon_pilon AUGUSTUS transcription_end_site 490 490 . + . Parent=chr04_pilon_pilon_pilon.g1.t1
Thank you in advance,
Michal
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/184#issuecomment-678837025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBYMD7OXX25RVIRRFLSCGOZJANCNFSM4QG55UPA .
Hi, I ran HiSat2, MarkDuplicate, removed reads with the lower quality score than 40 and finally only kept properly paired reads. Scallop/StringTie and TransDecoder could predict UTRs but unfortunately, Augustus has not found any:
I used the following commands for Augustus
What did I miss?
Thank you in advance,
Michal