ISA-tools / isa-api

ISA tools API
https://isa-tools.org
Other
40 stars 37 forks source link

test_load_config_metagenome_seq fails with latest commit #149

Closed althonos closed 7 years ago

althonos commented 7 years ago

Hi !

Running locally coverage run --source=isatools setup.py test fails on one test with the following error message:

======================================================================
FAIL: test_load_config_metagenome_seq (tests.test_isatab_configurator.TestIsaTabConfigurator)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/isa-api/tests/test_isatab_configurator.py", line 43, in test_load_config_metagenome_seq
    self.assertEqual(len(config_dict), 30)
AssertionError: 33 != 30

----------------------------------------------------------------------

did you forget to update tests with new values (as it seems the configuration loader loads too much values), or is this a bug happening only on my machine ?

djcomlab commented 7 years ago

Hiya, I can't seem to reproduce this error if I manually run that test test_load_config_metagenome_seq() in my PyCharm, and also in the latest release it has built successfully in Travis.

What Python version are you running? We are currently developing the API for Python 3.4+ only.

djcomlab commented 7 years ago

Ah also @althonos can you confirm which branch you're on and if you're also up-to-date with the repo?

althonos commented 7 years ago

Hiya David, I'm under Python 3.5.2 on Linux (CPython implementation), and using the latest commit of the master branch of ISA-Tools.

I encounter this issue whenever i launch "python3 setup.py test", i'll try running only the right test.

djcomlab commented 7 years ago

Hmm, really weird, the tests pass for me using same command, but using Python 3.4.3 on OSX Yosemite.

I did get some error briefly when I ran it first time just now, as some of the requirements were not up to date. Can you try doing pip3 install -r requirements-tests.txt --upgrade and run the test suite again?

althonos commented 7 years ago

I just checked and all my requirements are up-to-date, this is odd...

althonos commented 7 years ago

Running python3 -m unittest tests.test_isatab_configurator comes okay for both 2 tests...

So the issue happens only when running python3 setup.py test : maybe a path issue ? I really don't know what's causing that >_<

djcomlab commented 7 years ago

Very odd.

So in the test, what config_dict = configurator.load(self._config_dir) does is load each XML configuration from the specified directory into a dict. In this case from isa-api/tests/data/configs/xml/isaconfig-default_v2015-07-02. There should be exactly 30 XML configurations present, hence self.assertEqual(len(config_dict), 30).

It shouldn't be loading any more than what's there!

I will try find time to investigate on a Linux VM with the version of Python you are using also.

djcomlab commented 7 years ago

Can you add a print line to the test, like print(config_dict.keys()) to line 43 and copy-paste the output into a comment here?

It should look like:

dict_keys([('metabolite profiling', 'mass spectrometry'), ('transcription profiling', 'real time PCR'), ('[Sample]', ''), ('histone modification profiling', 'nucleotide sequencing'), ('cell counting', 'flow cytometry'), ('copy number variation profiling', 'DNA microarray'), ('DNA methylation profiling', 'nucleotide sequencing'), ('histology', ''), ('transcription factor binding site identification', 'nucleotide sequencing'), ('genome sequencing', 'nucleotide sequencing'), ('loss of heterozygosity profiling', 'DNA microarray'), ('transcription factor binding site identification', 'DNA microarray'), ('protein expression profiling', 'mass spectrometry'), ('clinical chemistry analysis', ''), ('environmental gene survey', 'nucleotide sequencing'), ('DNA methylation profiling', 'DNA microarray'), ('protein-DNA binding site identification', 'DNA microarray'), ('protein-protein interaction detection', 'protein microarray'), ('hematology', ''), ('protein expression profiling', 'gel electrophoresis'), ('cell sorting', 'flow cytometry'), ('SNP analysis', 'DNA microarray'), ('protein expression profiling', 'protein microarray'), ('metagenome sequencing', 'nucleotide sequencing'), ('transcription profiling', 'nucleotide sequencing'), ('protein-DNA binding site identification', 'nucleotide sequencing'), ('protein identification', 'mass spectrometry'), ('transcription profiling', 'DNA microarray'), ('metabolite profiling', 'NMR spectroscopy'), ('[investigation]', '')])

althonos commented 7 years ago

I get the following but only with python3 setup.py test: dict_keys([('protein-DNA binding site identification', 'DNA microarray'), ('copy number variation profiling', 'DNA microarray'), ('histone modification profiling', 'nucleotide sequencing'), ('cell sorting', 'flow cytometry'), ('histology', ''), ('metabolite profiling', 'NMR spectroscopy'), ('transcription factor binding site identification', 'nucleotide sequencing'), ('protein expression profiling', 'protein microarray'), ('DNA methylation profiling', 'nucleotide sequencing'), ('transcription profiling', 'real time PCR'), ('protein-protein interaction detection', 'protein microarray'), ('protein expression profiling', 'gel electrophoresis'), ('transcription profiling', 'nucleotide sequencing'), ('environmental gene survey', 'nucleotide sequencing'), ('chromatin modification profiling', 'nucleotide sequencing'), ('cell counting', 'flow cytometry'), ('SNP analysis', 'DNA microarray'), ('exome sequencing', 'nucleotide sequencing'), ('protein identification', 'mass spectrometry'), ('genome sequencing', 'nucleotide sequencing'), ('loss of heterozygosity profiling', 'DNA microarray'), ('transcription profiling', 'DNA microarray'), ('transcription factor binding site identification', 'DNA microarray'), ('metagenome sequencing', 'nucleotide sequencing'), ('metabolite profiling', 'mass spectrometry'), ('hematology', ''), ('[investigation]', ''), ('protein-RNA binding site identification', 'nucleotide sequencing'), ('[Sample]', ''), ('clinical chemistry analysis', ''), ('protein expression profiling', 'mass spectrometry'), ('DNA methylation profiling', 'DNA microarray'), ('protein-DNA binding site identification', 'nucleotide sequencing')])

When checking against the "normal" one, only I extract the 3 following:

('chromatin modification profiling', 'nucleotide sequencing')
('exome sequencing', 'nucleotide sequencing')
('protein-RNA binding site identification', 'nucleotide sequencing')
djcomlab commented 7 years ago

So this indicates to me there are somehow extra files being picked up by the test.

In the test, the configurator.load() should pick up files from utils.DEFAULT2015_XML_CONFIGS_DATA_DIR to load into the dict.

DEFAULT2015_XML_CONFIGS_DATA_DIR should correspond to isa-api/tests/data/configs/xml/isaconfig-default_v2015-07-02.

Can you see if somehow there are extra XML files chromatinmodification_seq.xml, exome_seq.xml, and protein_rna_binding_ident_seq.xml in the above directory? In the master branch and latest commit they should not be there.

Although that doesn't really explain why running the tests one way works and another way doesn't...

djcomlab commented 7 years ago

Oh and perhaps try add print(os.listdir(self._config_dir)) next to the other print you added before to see if the test code is getting the correct file list.

althonos commented 7 years ago

I just tried on another Linux/Py3.5 machine and had no issues whatsoever, so my own computer seems to be at fault here. Wait just a bit until I do the tests :cry:

djcomlab commented 7 years ago

The only way it knows to print out:

('chromatin modification profiling', 'nucleotide sequencing')
('exome sequencing', 'nucleotide sequencing')
('protein-RNA binding site identification', 'nucleotide sequencing')

is if it is reading corresponding XML files with this information in it!

Can you try a clean clone from Github of the project? Maybe there are some hidden files (the way git works with multiple branching) that maybe your versions of Linux/Python are somehow exposing to coverage.

althonos commented 7 years ago

So I did the following:

    def test_load_config_metagenome_seq(self):
        from isatools.io import isatab_configurator as configurator
        config_dict = configurator.load(self._config_dir)
        print(os.listdir(self._config_dir))
        self.assertEqual(len(config_dict), 30)
        self.assertEqual(config_dict[('metagenome sequencing', 'nucleotide seque$
                         .table_name,'metagenome_seq')

And I got the following: ['transcription_seq.xml', 'transcription_rtpcr.xml', 'transcription_micro.xml', 'tfbsident_seq.xml', 'tfbsident_micro.xml', 'studySample.xml', 'snpanalysis_micro.xml', 'proteinident_ms.xml', 'protein_expression_ms.xml', 'protein_expression_micro.xml', 'protein_expression_ge.xml', 'protein_dna_binding_ident_seq.xml', 'protein_dna_binding_ident_micro.xml', 'ppi_detection_micro.xml', 'metagenome_seq.xml', 'metaboliteprofiling_nmr.xml', 'metaboliteprofiling_ms.xml', 'investigation.xml', 'histonemodification_seq.xml', 'histology.xml', 'heterozygosity_micro.xml', 'hematology.xml', 'genome_seq.xml', 'envgen_survey_seq.xml', 'dnamethylation_seq.xml', 'dnamethylation_micro.xml', 'copynumvariation_micro.xml', 'clinical_chemistry.xml', 'cellsorting_flowcyt.xml', 'cellcount_flowcytometry.xml'] wich is of length 30...

So configurator.load seems to be going out of the directory it should stay in in my case.

althonos commented 7 years ago

That's actually a clean clone, I've cloned it three times and got this bug everytime...

But I DO have all of those 3 "additional" files in the tests/data/configs/xml/isaconfig-seq_v2016-08-39-SRA1.5-august2014mod/

which the test shouldn't parse, but may be parsing anyway

djcomlab commented 7 years ago

Yes that's right, it shouldn't be going outside into this directory, so I really am not sure what is happening here!

althonos commented 7 years ago

Finally found it.

Cause

Because config_dict is marked as a global, it keeps its content during the full unittest run - not emptying between each tests. So if test_validate_test_data.py gets run before test_isatab_configurator, then it will contain 33 keys, because it will have loaded the sra configuration xml beforehand. If after that, new configs are loaded, they will overwrite the values under same keys, but the 3 "extra" files will stay inside the global config_dictionary.

Workaround

djcomlab commented 7 years ago

Hi, thanks for this, this makes sense. The test should check for exactly 30 keys, it's not supposed to be any more than this.

I had assumed that if doingfrom isatools.io import isatab_configurator in two separate places, isatab_configurator would be reloaded (and the globals reset) but I guess this doesn't happen in this case.

For the fix, I'll remove the use of the global config_dict. The other global variables in this module are auto-generated code so really shouldn't be touched if they are not causing any unexpected behaviour.

djcomlab commented 7 years ago

I've just pushed a fix, do you want to try it out on your setup that the bug came up (Py 3.5.2 on Linux)?

althonos commented 7 years ago

Fix works :) Both commands succeeded: python -m unittest discover python setup.py test

djcomlab commented 7 years ago

Great! Thanks for the report and the suggested fixes!