NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
43 stars 18 forks source link

Add Augustus training extra steps #6

Closed mahesh-panchal closed 4 years ago

mahesh-panchal commented 4 years ago

Added extra steps for Augustus training steps. (#2 )

It needs testing since I don't have an appropriate test data set.

We should make providing a test dataset a prerequisite for adding enhancements.

mahesh-panchal commented 4 years ago

OK. Fixed Augustus issue. If you want a parameter to write species to /projects/references/augustus/config/species/ as well as to the current directory, we can add that too.

e.g. params.maker_species_publishdir

mahesh-panchal commented 4 years ago

Added params.maker_species_publishdir. Please update the README too please.

Juke34 commented 4 years ago

Why augustus_training_species option is a list? The readme says [ 'species1', 'species2' ] but only one species/profile could anyway be trained at a time. Any reason for that?

params.maker_species_publishdirparam should be empty by default, and in such case the step skipped. The current behaviour is a failure:

[40/1fc849] process > augustus_training_dataset:augustus_training (test_asecodes_parviclava)                                                                           [  0%] 0 of 1
Oops .. something went wrong
Error executing process > 'augustus_training_dataset:augustus_training (test_asecodes_parviclava)'
Caused by:
  /path

If cannot be fixed we should add a troubleshooting section for explaining it. Personally I would prefer we remove this last step rather having a failure message. It is not a big deal to move the result folder.

LucileSol commented 4 years ago

I had no problem with the

params.maker_species_publishdir

I put this path /projects/references/augustus/config/species/ because it is where maker will then check the path in our cluster. what do you mean by current behavior, when it is empty? I would prefer to not move the results folder as we can forget to do it or not do it properly but I understand it if you want to use this pipeline outside of our own cluster.

Juke34 commented 4 years ago

what do you mean by current behaviour

Actually letting the option like it is params.maker_species_publishdir = '/path/to/shared/maker/folder/' gives the error I mentioned.

By default it would be better to be params.maker_species_publishdir = undef and in such case skip this step.

mahesh-panchal commented 4 years ago

I'll repeat what I said before. The current version of nextflow has a bug where if the enabled for publishDir evaluates to false, it still tries to write the directory. This will be fixed in the next release. I use dummy paths in the pipeline to better explain what the values should be. You should use common sense to change them. The expected behaviour once the bug is fixed, is that if that folder does not exist, it will not write it there. For now, use undef if it fixes that behaviour, but after Nextflow is updated, you should not have to worry about it.

mahesh-panchal commented 4 years ago

I'll modify the augustus_training_species parameter. It was never clear to me what this actually supposed to do.

Juke34 commented 4 years ago

Thanks I didn’t get everything the first time :) It not the good place to talk about it but I also suggested as a general mechanism instead to use grep to get the parameters, the use of an utility script that should get the parameters with the meta information (comments that explain what they do and if they are optional or not). But II’m not sure we can have comments in a params.config file.