EPiCs-group / obelix

An automated workflow for generation & analysis of bidentate ligand containing complexes
https://epics-group.github.io/obelix/
GNU General Public License v3.0
0 stars 1 forks source link

Integration tests for MACE descriptor workflow and descriptor calculation for different input types #27

Closed niketagrawal closed 1 month ago

niketagrawal commented 3 months ago

Running the tests locally

Execute the command pytest -v from the root of the obelix repository

niketagrawal commented 2 months ago
akalikadien commented 1 month ago

When running the tests on a native Linux pc, we encountered an interesting bug. The .xyz files were loaded in a different order by the calculate_morfeus_descriptors function when compared to the same code being ran on WSL, This caused the descriptors' rows to be in a different order in the resulting csv files (see the filename_tud column of descriptors.csv compared to that of descriptors_pristine_xyz.csv. This led to the tests failing on the native Linux pc.

To check that the order was indeed different, we also printed the progress of the descriptor calculation, which indeed showed that [Rh+]_1-Naphthyl-DIPAMP_SP_1 was being read first and [Rh+]_1-Naphthyl-DIPAMP_SP_0 afterwards:

tests/test_descriptor_calculator/test_descriptor_calculator.py::TestDescriptorCalculation::test_filename_values[pristine-xyz] FAILED
tests/test_mace_descriptor_workflow/test_mace_descriptor_workflow.py::TestMaceDescriptorWorkflow::test_descriptor_values Workflow is initializing. Converting your dict. input to variables.

Reading MACE inputs
Preparing the folder structure
0
0

Calculating descriptors for:  [Rh+]_1-Naphthyl-DIPAMP_SP_1 ...

Calculating descriptors for:  [Rh+]_1-Naphthyl-DIPAMP_SP_0 ...
FAILED

A potential fix is being tested. It likely involves sorting the list of files in the calculate_morfeus_descriptors function of descriptor_calculator.py.

niketagrawal commented 1 month ago

Thanks @akalikadien for catching this.

Descriptor calculator receives the list of file paths to process via the below lines of code.

    if self.output_type.lower() == 'xyz':
        complexes_to_calc_descriptors = glob.glob(os.path.join(self.path_to_workflow, '*.xyz'))
        dictionary_for_properties = {}

These file paths are read in different order by different operating systems leading to differences in the output csv.

This can be fixed by sorting the list of file paths returned to the descriptor calculator.

    if self.output_type.lower() == "xyz":
        complexes_to_calc_descriptors = sorted(
            glob.glob(os.path.join(self.path_to_workflow, "*.xyz"))
        )

I applied the same sorting logic to crest and dft log files scenario as well.