Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
360 stars 80 forks source link

GeneMark version may be incompatible, resulting in train.f.gb format error #437

Open inspirewind opened 3 years ago

inspirewind commented 3 years ago

Hi dear developers, BRAKER is a powerful pipeline, but it often leads to compatibility issues. When I running the command --genome merge_fix.fna --esmode --min_contig=10000 --cores 8 --AUGUSTUS_ab_initio --softmasking, Braker has a running error of gbFilterEtraining.stderr. I searched for existing issues and found that it may be caused by an error in the train.f.gb file. At present, the GeneMark version required on the README of BRAKER is GeneMark-ES/ET/EP 4.64_lic, But the version provided on this website is version 4.68. Because wsl only provides the version of linux kernel 4, so I can only use the latest compiled GeneMark. How can I solve this problem?

braker.log gbFilterEtraining.stderr.gz

Best, Rui Li

tomasbruna commented 3 years ago

Hi Rui Li,

GeneMark 4.68 should be compatible, the problem is probably in something else. Did you try running test8.sh? It tests BRAKER with the --esmode flag.

@KatharinaHoff, have you ever encoutered this kind of error (in gbFilterEtraining.stderr.gz) during AUGUSTUS training?

Tomas

inspirewind commented 3 years ago

Hi Tomas, Thanks for your help.

I ran test8 with GeneMark 4.68 and everything seemed to be good. I saved the train.f.gb file generated by test8. It is a very neat GenBank format file. image

But the train.f.gb file generated on my own data looks like this, image Its format looks very messy, there are a lot of strange spaces here. It may be because of this that caused the gbFilterEtraining error when running Augustus etraining.

Rui Li

KatharinaHoff commented 3 years ago

That looks super weird. I have never seen it happening, before. There appear to be rather random linebreaks in the sequence. Are there linebreaks in the original genome sequence at these positions? Was the input sequence possibly stored in an encoding different from UTF-8?

On Sat, Oct 23, 2021 at 7:32 AM inspirewind @.***> wrote:

Hi Tomas, Thanks for your help.

I ran test8 with GeneMark 4.68 and everything seemed to be good. I saved the train.f.gb file generated by test8. It is a very neat GenBank format file. [image: image] https://user-images.githubusercontent.com/37958868/138543547-a34f289a-17bb-4746-8e3a-640a68456a4e.png

But the train.f.gb file generated on my own data looks like this, [image: image] https://user-images.githubusercontent.com/37958868/138543686-36121721-377c-426a-a7ac-628111bdc072.png Its format looks very messy, there are a lot of strange spaces here. It may be because of this that caused the gbFilterEtraining error when running Augustus etraining.

Rui Li

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/437#issuecomment-950077854, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCIV7AWHPMZY2HDMGDUIJCGPANCNFSM5GOYNVJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

inspirewind commented 3 years ago

Hi Katharina, My fault. The reason for the random line breaks is that I used the Windows system in a certain step of processing the file, so Windows added an annoying CRLF line break, which has been modified to work well. Perhaps it is possible to detect the presence of CRLF in the fna file in the BRAKER pipeline to inform those who are as confused as me at the earliest.

Rui Li

KatharinaHoff commented 3 years ago

We will keep this issue open. It's not high on my list of priorities to fix this, though (since it can be easily avoided). But I agree that if we find the time, we should make the script safe.

On Sun, Oct 24, 2021 at 8:56 AM inspirewind @.***> wrote:

Hi Katharina, My fault. The reason for the random line breaks is that I used the Windows system in a certain step of processing the file, so Windows added an annoying CRLF line break, which has been modified to work well. Perhaps it is possible to detect the presence of CRLF in the fna file in the BRAKER pipeline to inform those who are as confused as me at the earliest.

Rui Li

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/437#issuecomment-950271797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBYJELALS4CHV3ULULUIOUYHANCNFSM5GOYNVJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.