MolecularAI / aizynthtrain

Tools to train synthesis prediction models
Apache License 2.0
21 stars 7 forks source link

Help with model validation step #6

Closed ebrowndev closed 9 months ago

ebrowndev commented 11 months ago

Hi all, I'm having some trouble getting Aizynthtrain to complete a training run. It seems that almost all the program executes successfully, but near the end, the model validation step is failing and I'm unable to figure out why. Below is the output during a failed training session:

Validating your flow... The graph looks good! Running pylint... Pylint is happy! 2023-09-26 17:15:45.511 Workflow starting (run-id 1695748545509451): 2023-09-26 17:15:45.519 [1695748545509451/start/1 (pid 1729)] Task is starting. 2023-09-26 17:15:48.008 [1695748545509451/start/1 (pid 1729)] Task finished successfully. 2023-09-26 17:15:48.014 [1695748545509451/create_template_metadata/2 (pid 1733)] Task is starting. 2023-09-26 17:15:50.470 [1695748545509451/create_template_metadata/2 (pid 1733)] Task finished successfully. 2023-09-26 17:15:50.476 [1695748545509451/featurization_setup/3 (pid 1737)] Task is starting. 2023-09-26 17:15:52.907 [1695748545509451/featurization_setup/3 (pid 1737)] Foreach yields 1 child steps. 2023-09-26 17:15:52.908 [1695748545509451/featurization_setup/3 (pid 1737)] Task finished successfully. 2023-09-26 17:15:52.914 [1695748545509451/featurization/4 (pid 1741)] Task is starting. 2023-09-26 17:15:55.273 [1695748545509451/featurization/4 (pid 1741)] Task finished successfully. 2023-09-26 17:15:55.280 [1695748545509451/featurization_join/5 (pid 1746)] Task is starting. 2023-09-26 17:15:57.784 [1695748545509451/featurization_join/5 (pid 1746)] Task finished successfully. 2023-09-26 17:15:57.791 [1695748545509451/split_data/6 (pid 1750)] Task is starting. 2023-09-26 17:16:00.292 [1695748545509451/split_data/6 (pid 1750)] Task finished successfully. 2023-09-26 17:16:00.300 [1695748545509451/model_training/7 (pid 1754)] Task is starting. 2023-09-26 17:16:02.822 [1695748545509451/model_training/7 (pid 1754)] Task finished successfully. 2023-09-26 17:16:02.829 [1695748545509451/model_validation/8 (pid 1758)] Task is starting. 2023-09-26 17:16:05.840 [1695748545509451/model_validation/8 (pid 1758)] /home/dmurphy/aiz/ZZZ 2023-09-26 17:16:05.840 [1695748545509451/model_validation/8 (pid 1758)] uspto_unique_templates.csv.gz 2023-09-26 17:16:05.842 [1695748545509451/model_validation/8 (pid 1758)] failed: 2023-09-26 17:16:05.844 [1695748545509451/model_validation/8 (pid 1758)] Internal error 2023-09-26 17:16:05.844 [1695748545509451/model_validation/8 (pid 1758)] Traceback (most recent call last): 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/cli.py", line 1171, in main 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] start(auto_envvar_prefix="METAFLOW", obj=state) 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] return self.main(args, kwargs) 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] rv = self.invoke(ctx) 2023-09-26 17:16:05.845 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] return ctx.invoke(self.callback, ctx.params) 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] return callback(args, kwargs) 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] return f(get_current_context(), args, kwargs) 2023-09-26 17:16:05.846 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/cli.py", line 580, in step 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] task.run_step( 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/task.py", line 587, in run_step 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] self._exec_step_function(step_func) 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/metaflow/task.py", line 61, in _exec_step_function 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] step_function() 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/aiz/aizynthtrain/aizynthtrain/pipelines/expansion_model_pipeline.py", line 99, in model_validation 2023-09-26 17:16:05.847 [1695748545509451/model_validation/8 (pid 1758)] eval_one_step([self.config_path]) 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/aiz/aizynthtrain/aizynthtrain/modelling/expansion_policy/eval_one_step.py", line 194, in main 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] expander_output = _run_expander(ref_reactions_path, config_path, config) 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/aiz/aizynthtrain/aizynthtrain/modelling/expansion_policy/eval_one_step.py", line 143, in _run_expander 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] expander = AiZynthExpander(configfile=config_path) 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/aizynthfinder/aizynthfinder.py", line 249, in init 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] self.config = Configuration.from_file(configfile) 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/aizynthfinder/context/config.py", line 116, in from_file 2023-09-26 17:16:05.848 [1695748545509451/model_validation/8 (pid 1758)] return Configuration.from_dict(_config) 2023-09-26 17:16:05.849 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/aizynthfinder/context/config.py", line 96, in from_dict 2023-09-26 17:16:05.849 [1695748545509451/model_validation/8 (pid 1758)] config_obj.expansion_policy.load_from_config(src_copy.get("policy", {})) 2023-09-26 17:16:05.850 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/aizynthfinder/context/policy/policies.py", line 112, in load_from_config 2023-09-26 17:16:05.851 [1695748545509451/model_validation/8 (pid 1758)] strategy = TemplateBasedExpansionStrategy( 2023-09-26 17:16:05.851 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/aizynthfinder/context/policy/expansion_strategies.py", line 94, in init 2023-09-26 17:16:05.851 [1695748545509451/model_validation/8 (pid 1758)] self.templates: pd.DataFrame = pd.read_hdf(templatefile, "table") 2023-09-26 17:16:05.851 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/pandas/io/pytables.py", line 416, in read_hdf 2023-09-26 17:16:05.851 [1695748545509451/model_validation/8 (pid 1758)] store = HDFStore(path_or_buf, mode=mode, errors=errors, kwargs) 2023-09-26 17:16:06.440 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/pandas/io/pytables.py", line 578, in init 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] self.open(mode=mode, kwargs) 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/pandas/io/pytables.py", line 737, in open 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] self._handle = tables.open_file(self._path, self._mode, kwargs) 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/tables/file.py", line 300, in open_file 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] return File(filename, mode, title, root_uep, filters, kwargs) 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] File "/home/dmurphy/anaconda3/envs/aizynthtrain/lib/python3.9/site-packages/tables/file.py", line 752, in init 2023-09-26 17:16:06.441 [1695748545509451/model_validation/8 (pid 1758)] self._g_new(filename, mode, params) 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] File "tables/hdf5extension.pyx", line 486, in tables.hdf5extension.File._g_new 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] tables.exceptions.HDF5ExtError: HDF5 error back trace 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] File "H5F.c", line 620, in H5Fopen 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] unable to open file 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] File "H5VLcallback.c", line 3522, in H5VL_file_open 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] open failed 2023-09-26 17:16:06.442 [1695748545509451/model_validation/8 (pid 1758)] File "H5VLcallback.c", line 3351, in H5VLfile_open 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] open failed 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] File "H5VLnative_file.c", line 97, in H5VLnative_file_open 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] unable to open file 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] File "H5Fint.c", line 1990, in H5F_open 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] unable to read superblock 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] File "H5Fsuper.c", line 405, in H5F__super_read 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] file signature not found 2023-09-26 17:16:06.443 [1695748545509451/model_validation/8 (pid 1758)] 2023-09-26 17:16:06.444 [1695748545509451/model_validation/8 (pid 1758)] End of HDF5 error back trace 2023-09-26 17:16:06.444 [1695748545509451/model_validation/8 (pid 1758)] 2023-09-26 17:16:06.444 [1695748545509451/model_validation/8 (pid 1758)] Unable to open/create file 'uspto_unique_templates.csv.gz' 2023-09-26 17:16:06.444 [1695748545509451/model_validation/8 (pid 1758)] 2023-09-26 17:16:06.444 [1695748545509451/model_validation/8 (pid 1758)] Task failed. 2023-09-26 17:16:06.444 Workflow failed. 2023-09-26 17:16:06.445 Terminating 0 active tasks... 2023-09-26 17:16:06.445 Flushing logs... Step failure: Step model_validation (task-id 8) failed.

After looking into it more, it appears that the file that is being opened (uspto_unique_templates.csv.gz) is supposed to be an HDF5 file, but is not. When unzipped it is simply a csv file. Is it possible that this file has some invalid data in it causing a conversion to fail? I notice when examining the CSV that most of the entries look like SMARTS strings but there are also some entries that look like they are corrupted somehow.

Here is an example of what I suspect may be an invalid entry in the file: H0:4] 0002f97a4d38f973095fc7d64488966c9d7b321beb0ea6c38dd61aefac810c0f 0.0 Unrecognized 7

Any advice for me any or more information that I can provide that would be helpful? Thanks for any help!

SGenheden commented 11 months ago

Could it be that you have an old version of aizynthfinder in your environment? How did you setup your enivornment? This part of aizynthfinder should trigger the reading of a gzipped CSV file, not an HDF5 file.

You can try installing the latest aizynthfinder version in your environment with pip.

SGenheden commented 9 months ago

Closing due to inactivity