CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

Tests don't pass in released 1.1.1 #469

Closed TyberiusPrime closed 6 months ago

TyberiusPrime commented 3 years ago

Download the umi-tools 1.1.1 release, unzip, install in fresh venv, add pyyaml and nose, run tests.

14-16 of 58 tests fail. (it varies a bit - RNG related?). full log: umitools_fail.txt.

What's up with that?

TomSmithCGAT commented 3 years ago

Can you try setting the python hashseed first (export PYTHONHASHSEED=0).

Since python 3.3, the hashing used in e.g dictionary keys is non-determininistic and are 'salted' with a unpredictable random values: https://docs.python.org/3.4/reference/datamodel.html#object.__hash__. I understand this is prevent DOS attacks.

TyberiusPrime commented 3 years ago

Yes that does allow the tests to pass.

This could be done when calling umi-tools from the tests. You're forking out to bash in line 122 of test_umi_tools.py anyway, might as well pass that environment variable?

Since python 3.6, dicts remember their insertion order, and iter in that order, and that's part of the API since 3.7 .

Set's still have no defined order though.

All the failing tests are actually in dedup & group ('whitelist_scrb' in the output non-withstanding), and it's not just the output being in a different order.

The actual issue is in network.breadth_first_search which changes order depending on the actual hash values. PR fixing incoming.

TomSmithCGAT commented 6 months ago

Closing due to inactivity. There's a (long) outstanding PR for making UMI-tools determininstic (#550) following this issue. One day I might even get that merged...