anmartinezs / polnet

Generates synthetic datasets for Cryo-Electron Tomography
Other
20 stars 9 forks source link

Add support for using a random seed for repeatability #16

Closed kephale closed 1 month ago

kephale commented 2 months ago

This adds a random seed to all_features2, and propagates some of those changes in the code.

This also adds a test to show that the results are the same with the same random seed and different with different seeds.

kephale commented 2 months ago

Current output:

======================================================================
FAIL: test_repeatability (__main__.TestAllFeatures2Repeatability)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kharrington/git/anmartinezs/polnet/tests/test_repeatability.py", line 116, in test_repeatability
    self.assertEqual(hashes1, hashes2, "Key output files are not identical between runs")
AssertionError: {'tom[16 chars]c': '734d4fd6d885a7857d2aa6c8d024af39', 'tomos[230 chars]7e6'} != {'tom[16 chars]c': '0e29104d55f8ac4f2186df77bd8df8cd', 'tomos[230 chars]02d'}
  {'tomos/poly_den_0.vtp': 'd021f5e9f13952d8d5df43a551ce7cab',
   'tomos/poly_skel_0.vtp': '0530b5fa41f6d6a102d7234b653c2f6c',
-  'tomos/tomo_den_0.mrc': '734d4fd6d885a7857d2aa6c8d024af39',
-  'tomos/tomo_lbls_0.mrc': 'b133b6d87b336f82c430fd70cf43ff60',
-  'tomos_motif_list.csv': 'ea6ec9517f678e747a5bb5f75ecff7e6'}
+  'tomos/tomo_den_0.mrc': '0e29104d55f8ac4f2186df77bd8df8cd',
+  'tomos/tomo_lbls_0.mrc': '1304d2534f288bcb8b6e013987c0b34d',
+  'tomos_motif_list.csv': '4d4258aa2eb3f5f317414f749e3f602d'} : Key output files are not identical between runs
anmartinezs commented 2 months ago

Hi kephale

It is nice you're making progresses with repeatability. I am not experienced in such tests, but I guess you're comparing if the output files are exactly the same between different runs. However, when these files are writen on disk maybe some unique information can be added to the file headers, such as time, that makes these files not exactly the same although the data contained are the same.

Best

kephale commented 2 months ago

@anmartinezs aha you got it! Now this compares the array contents of mrcs, and checks the csv outputs while ignoring some columns (e.g. differences in the mrc file paths for run1 and run2).

The test passes now!

kephale commented 1 month ago

@Fran-AM Great, sorry for the slow response! Thank you for catching those bugs. I've made fixes now.