Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

missing BUSCOs in BRAKER #784

Closed zzbbf123 closed 3 months ago

zzbbf123 commented 3 months ago

Hi:

I ran Braker3 v3.0.7 with command --busco_lineage embryophyta_odb10 trying to minimize missing BUSCOs. The resulting log file "best_by_compleasm.log" is shown below:

BRAKER is missing 2.35 BUSCOs.
GeneMark is missing 0.62 BUSCOs.
Augustus is missing 0.19 BUSCOs.
All BUSCOs present in augustus.hints.gtf and genemark.gtf will be added to the braker.gtf gene set.
Attempted to merge additional BUSCOs onto braker.gtf but there are no BUSCOs to be added.
The BRAKER gene set /data/braker.gtf is the best one. It lacks 2.35% BUSCOs.

Based on my understanding, it appears that BRAKER has the lowest amount of missing BUSCOs at 2.35%. The program attempted to minimize the missing BUSCOs in the final output of BRAKER by re-running TSEBRA, but this attempt was failed because TSEBRA did not identify complete or duplicated BUSCOs without frame shifts (as mentione in issue #660 ). As a result, braker.gtf remains the best option, missing BUSCOs is still 2.35%.

  1. I am quite curious about why GeneMark and Augustus are able to predict more BUSCOs than BRAKER.
  2. These additional BUSCOs predicted by GeneMark and Augustus may not be reliable, but they still provide a form of information. If I want to integrate these additional BUSCOs into the BRAKER output, any command I can use in BRAKER?

Best

KatharinaHoff commented 3 months ago

This sounds like a familiar issue. Is your sif file older than 21 days? If yes, please rebuild your sif file with the latest container. I fixed a problem 21 days ago, the problem was not in BRAKER but in TSEBRA.

On Tue, Mar 19, 2024 at 9:57 AM Zhoubiaofeng @.***> wrote:

Hi:

I ran Braker3 v3.0.7 with command --busco_lineage embryophyta_odb10 trying to minimize missing BUSCOs. The resulting log file "best_by_compleasm.log" is shown below:

BRAKER is missing 2.35 BUSCOs. GeneMark is missing 0.62 BUSCOs. Augustus is missing 0.19 BUSCOs. All BUSCOs present in augustus.hints.gtf and genemark.gtf will be added to the braker.gtf gene set. Attempted to merge additional BUSCOs onto braker.gtf but there are no BUSCOs to be added. The BRAKER gene set /data/braker.gtf is the best one. It lacks 2.35% BUSCOs.

Based on my understanding, it appears that BRAKER has the lowest amount of missing BUSCOs at 2.35%. The program attempted to minimize the missing BUSCOs in the final output of BRAKER by re-running TSEBRA, but this attempt was failed because TSEBRA did not identify complete or duplicated BUSCOs without frame shifts (as mentione in issue #660 https://github.com/Gaius-Augustus/BRAKER/issues/660 ). As a result, braker.gtf remains the best option, missing BUSCOs is still 2.35%.

  1. I am quite curious about why GeneMark and Augustus are able to predict more BUSCOs than BRAKER.
  2. These additional BUSCOs predicted by GeneMark and Augustus may not be reliable, but they still provide a form of information. If I want to integrate these additional BUSCOs into the BRAKER output, any command I can use in BRAKER?

Best

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCHQWPRYPDOPSZOD7TYY74ZDAVCNFSM6AAAAABE5DQVYGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4TIMZYG44DKMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zzbbf123 commented 3 months ago

yes, you are right. The update makes the result very beautiful and incredibly high...

        --------------------------------------------------
        |Results from dataset embryophyta_odb10           |
        --------------------------------------------------
        |C:99.5%[S:76.9%,D:22.6%],F:0.3%,M:0.2%,n:1614    |
        |1606   Complete BUSCOs (C)                       |
        |1241   Complete and single-copy BUSCOs (S)       |
        |365    Complete and duplicated BUSCOs (D)        |
        |5      Fragmented BUSCOs (F)                     |
        |3      Missing BUSCOs (M)                        |
        |1614   Total BUSCO groups searched               |
        --------------------------------------------------

I dont have any further questions, please close this issue, thanks.

Best