Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
121 stars 4 forks source link

Run miniprot only once #25

Closed tomasbruna closed 1 year ago

tomasbruna commented 1 year ago

Tested on test1.sh.

Also tested on large protein inputs - "miniprothint gtf" was always matching the one miniprot would produce natively.

tomasbruna commented 1 year ago

Ahh, the previous implementation would not work if multiple separate protein files were on input. That's fixed.

The current implementation might still run into issues if the uptodate( [ $prot_seq_files[$i] ], [$prot_hintsfile] is triggered. I'll look into this more.

tomasbruna commented 1 year ago

Also, I realized IDs might clash if miniprot is run separately on multiple files (that's related to the original implementation as well). https://github.com/tomasbruna/miniprothint/commit/a38f3002f60bb47953988630bbeb26f04134ae9c should take care of that.

KatharinaHoff commented 1 year ago

The uptodate stuff does not work in all places, anyway. I should go through the code and remove it... in the early days for BRAKER, when we had a lot of crashes, we had that idea that users would continue a failed run after fixing BRAKER... but as I said: it's buggy, either way.

The current implementation might still run into issues if the uptodate( [ $prot_seq_files[$i] ], [$prot_hintsfile] is triggered. I'll look into this more.

KatharinaHoff commented 1 year ago

Our HPC is going into maintenance, tomorrow morning. I will run the tests on our end before merging. So the merge might take 2-3 days from now (depending on when our HPC awakens from its upgrade).

Thank you so much for all the work that you put into this, Tomas!!! This is really good.

tomasbruna commented 1 year ago

The uptodate stuff does not work in all places, anyway. I should go through the code and remove it...

I was hoping you'd say that :), it seems like a big headache to maintain it.

KatharinaHoff commented 1 year ago

If we at some point in time fully port to python, then snakemake (or Nextflow) would take care of this... timeline for this is very unsure on our end, though. I remove uptodate from the code for now.

I was hoping you'd say that :), it seems like a big headache to maintain it.