frejanordsiek / hdf5storage

Python package to read and write a wide range of Python types to/from HDF5 formatted files. Can read/write data to the HDF5 based Matlab v7.3 MAT files.
BSD 2-Clause "Simplified" License
82 stars 24 forks source link

test_write_readback.TestPythonMatlabFormat fails due to clash in random filenames #104

Closed drew-parsons closed 3 years ago

drew-parsons commented 3 years ago

Test failure was reported against the Debian build at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=971806

test_write_readback.TestMatlabFormat.test_dtype_structured_with_offsets_titles fails, occasionally:

ERROR:
test_write_readback.TestMatlabFormat.test_dtype_structured_with_offsets_titles
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File
"/tmp/autopkgtest-lxc.bkohktuu/downtmp/autopkgtest_tmp/tests/test_write_readback.py",
line 862, in test_dtype_structured_with_offsets_titles
    np.dtype(desc).itemsize + random.randint(1, 100)
ValueError: name already used as a name or title

The test does not always fail. Since the failure involves a filename clash on a randomly generated file, it seems the test algorithm is not robust with respect to repetition of the randomly generated number.

If the number is generated with random.randint(1, 100) then you might expect test failure due to filename clash "1%" of the time. I'm not certain this is the root problem since a numpy ValueError is also involved. But if it is just the randomly generated filename causing the problem, then some validation should be added to ensure the same filename is not generated twice. Or alternatively use the tempfile API which already ensures file uniqueness, https://docs.python.org/3/library/tempfile.html

frejanordsiek commented 3 years ago

Fixed in commit 1444936 in the main branch. It had nothing to do with files, but instead had to do with numpy requiring titles in dtypes to be unique. This is now fixed.

The test isn't in the 0.1.x branch at all, so no commit needs to be done there. Given that debian is packaging the 0.1.x branch, I have no clue why this error is turning up in the debian packaging when running the tests. The unit test in question tests functionality added to the main branch only, not the legacy 0.1.x branch (specifically, testing the marshalling of numpy.dtype objects). It frankly is not a surprise that things are going wrong with code mixed from both branches.

drew-parsons commented 3 years ago

Thanks Freja, commit #1444936 seems to be working fine for us.

The Debian code wasn't mixed, it was a snapshot of the main branch at June 2020. I took the snapshot because it fixed some documentation issues that were causing the old release to fail to build.