cerebis / meta-sweeper

Parametric sweep of simulated microbial communities and metagenomic sequencing.
GNU General Public License v3.0
10 stars 0 forks source link

Cached custom objects rely on weak hash codes #33

Closed cerebis closed 7 years ago

cerebis commented 8 years ago

During checks for cached tasks, Nextflow tests identity of deserialized objects using hash. Our custom classes (Key, NamedValue) fallback to regular Java hashCode, whereas Nextflow prefers com.google.common.hash. Currently this works but would be ideal to have support and at least get rid of the warning in log files.

i.e.

DEBUG nextflow.util.CacheHelper - [WARN] Unknown hashing type: class mzd.Helper$Key
cerebis commented 8 years ago

hashCode methods were slightly improved in commit 6f5637298f44ebe3d59c600be2c6353e465d49e3

cerebis commented 8 years ago

To address this problem, Nextflow would need to be refactored to allow the registration of type specific hash methods -- analogous to how serialisation is accomplished. Currently, Nextflow has enumerated types as a hardcoded a chain of conditional tests.

https://github.com/cerebis/nextflow/blob/master/subprojects/nxf-commons/src/main/nextflow/util/CacheHelper.java

cerebis commented 8 years ago

Cached tasks might still have a bug. Repeated callls on a wider sweep produce a few repeated runs of tasks that appear to have completed without error. This might be a problem with the sweep itself however.

cerebis commented 8 years ago

This nondeterministic repeat invocation of a subset of HiCMap processes is likely unpreserved ordered in what are the processes prerequisites.

It is at this point in the awe that two output channels are joined by a Helper method. The resulting join might be correct in pairing rows but may not emit the rows in the same order.

This would then mean task_n might not map to the same number between repeated calls.

cerebis commented 8 years ago

Non-determinism bug has been moved to issue #34.

cerebis commented 7 years ago

As NamedValue has been removed, this should not be a factor. Requesting deep caching on files will now be properly dealt with.