cerebis / meta-sweeper

Parametric sweep of simulated microbial communities and metagenomic sequencing.
GNU General Public License v3.0
10 stars 0 forks source link

Nextflow resume is broken by the NamedValue helper class #32

Closed cerebis closed 8 years ago

cerebis commented 8 years ago

To make groovy scripts more legible with sweep implementation, I have written a Helper.groovy class which imports some static classes.

One in particular (NamedValue) permits the naming of values, so that rows in the sweep can be sliced by name rather than index. By-name offers value in robustness against unintended side-effects resulting in a particular value's index changing and is much more legible (contextual) in workflows.

However, this currently breaks Nextflow resume.

Processes can be directed to store hashes of output channel values and then check for their pre-existence before regenerating them on subsequent runs.

Hashes of NamedValue must derived from the contained object, so that this happens as intended.

cerebis commented 8 years ago

Synopsis of underlying issue and the solution.

Regular (non-deep) Nextflow caches operate only on the state of the variables held by output Channels, while deep caching extends to file contents (for File-like types) and results in the storage of additional data (somewhere -- not investigated as yet).

For now, I have limited this to fixing non-deep caching.

Variable state is serialized to persistent storage using the Kryo library and recorded per-task in the file named command.val. Kryo replaces Java serialisation and provides support for many common types often implemented as extensions of Serializer. Nextflow has implemented a number of these and this is all found in the class KryoHelper, which attempts to follow a singleton pattern when handling the Kryo "session".

Our custom classes, Key and NamedValue are not supported by default. Not because they require custom serializers, mind you, but because they must be registered with the singleton Kryo instance. After registration and without providing a Serializer, Kryo will simply employ its own Serializers or those registered by Nextflow (FileSerializer).

One gotcha, serialized objects require a zero-argument constructor to be reconstituted (deserialized). Therefore, if a class has implemented only non-zero constructors, a zero-arg constructor must be explicitly added. For those classes not implementing any constructor (Key), there an implicit zero-arg already exists.

To avoid an unintended bug later, we have implemented zero-args for both Key and NamedValue.