Integration tests for MACE descriptor workflow and descriptor calculation for different input types

niketagrawal commented 5 months ago

Closes #23
The code changes implement a test for verifying whether OBeLiX produces the expected descriptor values for a given set of xyz files generated by MACE.
Due to a potential bug in MACE, the test fails when parametrized for multiple input and expected output combinations. In this case, the test always fails for the input that is second in the parametrization list. If MACE is run sequentially for two inputs, the XYZ files produced are not identical if the order of the inputs is changed.
The original contents of tests/ are moved to obelix/example_workflow/ to differentiate an example workflow from tests.

Running the tests locally

Execute the command pytest -v from the root of the obelix repository

niketagrawal commented 4 months ago

Changes for integration test for descriptor calculation for different input types merged in this branch.
Closes #25

akalikadien commented 3 months ago

When running the tests on a native Linux pc, we encountered an interesting bug. The .xyz files were loaded in a different order by the calculate_morfeus_descriptors function when compared to the same code being ran on WSL, This caused the descriptors' rows to be in a different order in the resulting csv files (see the filename_tud column of descriptors.csv compared to that of descriptors_pristine_xyz.csv. This led to the tests failing on the native Linux pc.

To check that the order was indeed different, we also printed the progress of the descriptor calculation, which indeed showed that [Rh+]_1-Naphthyl-DIPAMP_SP_1 was being read first and [Rh+]_1-Naphthyl-DIPAMP_SP_0 afterwards:

tests/test_descriptor_calculator/test_descriptor_calculator.py::TestDescriptorCalculation::test_filename_values[pristine-xyz] FAILED
tests/test_mace_descriptor_workflow/test_mace_descriptor_workflow.py::TestMaceDescriptorWorkflow::test_descriptor_values Workflow is initializing. Converting your dict. input to variables.

Reading MACE inputs
Preparing the folder structure
0
0

Calculating descriptors for:  [Rh+]_1-Naphthyl-DIPAMP_SP_1 ...

Calculating descriptors for:  [Rh+]_1-Naphthyl-DIPAMP_SP_0 ...
FAILED

A potential fix is being tested. It likely involves sorting the list of files in the calculate_morfeus_descriptors function of descriptor_calculator.py.

niketagrawal commented 3 months ago

Thanks @akalikadien for catching this.

Descriptor calculator receives the list of file paths to process via the below lines of code.

    if self.output_type.lower() == 'xyz':
        complexes_to_calc_descriptors = glob.glob(os.path.join(self.path_to_workflow, '*.xyz'))
        dictionary_for_properties = {}

These file paths are read in different order by different operating systems leading to differences in the output csv.

This can be fixed by sorting the list of file paths returned to the descriptor calculator.

    if self.output_type.lower() == "xyz":
        complexes_to_calc_descriptors = sorted(
            glob.glob(os.path.join(self.path_to_workflow, "*.xyz"))
        )

I applied the same sorting logic to crest and dft log files scenario as well.

EPiCs-group / obelix

Integration tests for MACE descriptor workflow and descriptor calculation for different input types #27

Running the tests locally