Open aureliendejode opened 5 days ago
Why do you think the number of genes is too low? How many genes do you see in close relatives an how were these annotated?
aureliendejode @.***> schrieb am Mi. 16. Okt. 2024 um 18:11:
Hello, I have used BRAKER3 with default parameters to annotate 3 anemone genomes and my busco scores were lower than in my genome and so I ran it again using the --busco_lineages option and it solved that issue. However, there is still a big difference in the number of protein among the braker.aa, genemark.aa and augustus.hints.aa files. Is it something that need to be fixed ? (I started to run omark on the braker.aa and the results look fine to me.) If yes, it seems to me this might come from tsebra and there is maybe a way to run tsebra differently ?
Here are the stats for the 2 braker runs:
Before using --busco_lineage
BUSCO version is: 5.6.1
The lineage dataset is: eukaryota_odb10 (Creation date: 2024-01-08, number of genomes: 70, number of BUSCOs: 255)
Summarized benchmarking in BUSCO notation for file /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Entacmea_quesricolor/annotation_BRAKER3/braker/braker.aa
BUSCO was run in mode: proteins
Results:
C:92.5%[S:83.1%,D:9.4%],F:1.6%,M:5.9%,n:255 236 Complete BUSCOs (C)
212 Complete and single-copy BUSCOs (S) 24 Complete and duplicated BUSCOs (D)
4 Fragmented BUSCOs (F)
15 Missing BUSCOs (M)
255 Total BUSCO groups searched-rw-r--r-- 1 adejode bmtitus 18M 14 oct. 16:33 Augustus/augustus.hints.aa -rw-r--r-- 1 adejode bmtitus 10M 14 oct. 16:35 braker.aa -rw-r--r-- 1 adejode bmtitus 19M 15 oct. 14:19 GeneMark-ETP/genemark.aa
grep -c ">" braker.aa Augustus/augustus.hints.aa GeneMark-ETP/genemark.aa braker.aa:19761 Augustus/augustus.hints.aa:38767 GeneMark-ETP/genemark.aa:36201
BUSCO version is: 5.6.1
The lineage dataset is: metazoa_odb10 (Creation date: 2024-01-08, number of genomes: 65, number of BUSCOs: 954)
Summarized benchmarking in BUSCO notation for file /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Entacmea_quesricolor/annotation_BRAKER3/braker/braker.aa
BUSCO was run in mode: proteins
Results:
C:91.7%[S:83.6%,D:8.1%],F:1.3%,M:7.0%,n:954 875 Complete BUSCOs (C)
798 Complete and single-copy BUSCOs (S) 77 Complete and duplicated BUSCOs (D)
12 Fragmented BUSCOs (F)
67 Missing BUSCOs (M)
954 Total BUSCO groups searchedafter using --busco_lineage
BUSCO version is: 5.6.1
The lineage dataset is: eukaryota_odb10 (Creation date: 2024-01-08, number of genomes: 70, number of BUSCOs: 255)
Summarized benchmarking in BUSCO notation for file /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Entacmea_quesricolor/annotation_BRAKER3/braker_busco_lineage/braker/braker.aa
BUSCO was run in mode: proteins
Results:
C:97.3%[S:72.2%,D:25.1%],F:0.8%,M:1.9%,n:255
248 Complete BUSCOs (C)
184 Complete and single-copy BUSCOs (S) 64 Complete and duplicated BUSCOs (D)
2 Fragmented BUSCOs (F)
5 Missing BUSCOs (M)
255 Total BUSCO groups searchedDependencies and versions: hmmsearch: 3.1 busco: 5.6.1
BUSCO version is: 5.6.1
The lineage dataset is: metazoa_odb10 (Creation date: 2024-01-08, number of genomes: 65, number of BUSCOs: 954)
Summarized benchmarking in BUSCO notation for file /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Entacmea_quesricolor/annotation_BRAKER3/braker_busco_lineage/braker/braker.aa
BUSCO was run in mode: proteins
Results:
C:97.5%[S:69.5%,D:28.0%],F:0.6%,M:1.9%,n:954
930 Complete BUSCOs (C)
663 Complete and single-copy BUSCOs (S) 267 Complete and duplicated BUSCOs (D)
6 Fragmented BUSCOs (F)
18 Missing BUSCOs (M)
954 Total BUSCO groups searched-rw-r--r-- 1 adejode bmtitus 18M 16 oct. 10:04 Augustus/augustus.hints.aa -rw-r--r-- 1 adejode bmtitus 11M 16 oct. 10:07 braker.aa -rw-r--r-- 1 adejode bmtitus 19M 16 oct. 10:52 GeneMark-ETP/genemark.aa
grep -c ">" braker.aa Augustus/augustus.hints.aa GeneMark-ETP/genemark.aa braker.aa:20454 Augustus/augustus.hints.aa:38756 GeneMark-ETP/genemark.aa:36206
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JDFOLBDMAW7KOD2CADZ32FZNAVCNFSM6AAAAABQB2QX52VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TEMZWHA2DSMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
There are not many genomes of closely relative... but as an example Nematostella vectensis has ~19 000 protein coding genes and ~38 000 genes and the annotation was conducted with the NCBI Eukaryotic Genome Annotation Pipeline.
I am actually not sure the number of genes is too low, I was just wondering if the differences in terms of number of sequences (among braker.aa, genemark.aa and augustus.hints.aa) is something to be concerned about ? Especially since the braker file is quite smaller (gft files) and contains way less proteins than the augustus and genemark ones.
-rw-r--r-- 1 adejode bmtitus 89M 14 oct. 11:31 GeneMark-ETP/genemark.gtf
-rw-r--r-- 1 adejode bmtitus 67M 14 oct. 16:33 Augustus/augustus.hints.gtf
-rw-r--r-- 1 adejode bmtitus 46M 14 oct. 16:34 braker.gtf
The difference between Augustus, GeneMark-ETP and BRAKER alone is not a strong indication that anything is wrong. A lack of evidence on the other hand would lead to too strict filtering. But we have no numbers to estimate this. I would not worry about it if nothing important is missing. You say OMArk scores are good, one remote relative has similar numbers - it may be ok.
aureliendejode @.***> schrieb am Mi. 16. Okt. 2024 um 20:11:
There are not many genomes of closely relative... but as an example Nematostella vectensis has ~19 000 protein coding genes and ~38 000 genes and the annotation was conducted with the NCBI Eukaryotic Genome Annotation Pipeline.
I am actually not sure the number of genes is too low, I was just wondering if the differences in terms of number of sequences (among braker.aa, genemark.aa and augustus.hints.aa) is something to be concerned about ? Especially since the braker file is quite smaller (gft files) and contains way less proteins than the augustus and genemark ones.
-rw-r--r-- 1 adejode bmtitus 89M 14 oct. 11:31 GeneMark-ETP/genemark.gtf -rw-r--r-- 1 adejode bmtitus 67M 14 oct. 16:33 Augustus/augustus.hints.gtf -rw-r--r-- 1 adejode bmtitus 46M 14 oct. 16:34 braker.gtf
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/876#issuecomment-2417568356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JB6TQWJSGRUYYBFOW3Z32T3ZAVCNFSM6AAAAABQB2QX52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGU3DQMZVGY . You are receiving this because you commented.Message ID: @.***>
Great, thanks for your insights on this!
Hello, I have used BRAKER3 with default parameters to annotate 3 anemone genomes and my busco scores were lower than in my genome and so I ran it again using the --busco_lineages option and it solved that issue. However, there is still a big difference in the number of protein among the braker.aa, genemark.aa and augustus.hints.aa files. Is it something that need to be fixed ? (I started to run omark on the braker.aa and the results look fine to me.) If yes, it seems to me this might come from tsebra and there is maybe a way to run tsebra differently ?
Here are the stats for the 2 braker runs: