MolecularAI / aizynthtrain

Tools to train synthesis prediction models
Apache License 2.0
21 stars 7 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'validated_reactions.csv.38' #21

Closed LyuboKotop closed 6 days ago

LyuboKotop commented 3 weeks ago

Hey, I am currently trying to train a model with my own dataset. So I pre-processed my data with the rxnutils and rxnmapper pipelines and I got my data_mapped.csv file that I would use for training.

I then updated the template_pipeline_config.yml file with the path to my dataset and ran the aizynthrain template pipeline with: python -m aizynthtrain.pipelines.template_pipeline run --config template_pipeline_config.yml --max-workers 32 --max-num-splits 200

It seems that the aizynthtrain works correctly as I get multiple outputs with "Running..." and "Task finished successfully" but I always end up with the error "FileNotFoundError: [Errno 2] No such file or directory: 'validated_reactions.csv.38'".

I also tried running the template pipeline on the USPTO data from the tutorial (even putting it in the same directory as the template_pipeline config file) and it still comes up with the same error.

I looked into it and I believe the issue might be coming from when the aizynthtrain calls rxnutils, specifically in the batch_utils.py file within the combine_csv_batches function. After adding in some print statements, it seems that we get inside the combine_csv_batches function but none of the print statements within the _write_csv function produce any outputs. I believe the _write_csv function does not get called for some reason.

I would greatly appreciate it if you could help out with this one.

Untitled

SGenheden commented 3 weeks ago

Hello,

My best guess is that the validation step broke down for batch 38 and hence it did not produce the necessary CSV file. You should have some record of the error message from this step in the log, but it is easy to miss and often hard to interpret.

It is a bit cumbersome to debug this, but the easiest is to try to run the validation step manually, but only for batch 38.

First, figure out the batch start and end indices. The easiest is likely to add a print-statement in aizynthtrain/pipelines/template_pipeline.py at line 57, printing idx, start, end

Seccond, run

python -m rxnutils.pipeline.runner --pipeline aizynthtrain/pipelines/data/reaction_validation_pipeline.yaml --data imported_data.py --output temp.csv --max-workers 1 --batch START END --no-intermediates

where imported_data.py is the CSV produced in the step before. "START" and "END" are the batch start and end indices that you got from the print-statement.

This should give you a clear error message.

LyuboKotop commented 2 weeks ago

So I added the print statement and ran the pipeline with:

python -m aizynthtrain.pipelines.template_pipeline run --config template_pipeline_config.yml --max-workers 32 --max-num-splits 200

Last print came as Start: 38, End: 39.

So I then ran:

python -m rxnutils.pipeline.runner --pipeline reaction_validation_pipeline.yaml --data imported_data.py --output temp.csv --max-workers 1 --batch 38 39 --no-intermediates

But got error:

FileNotFoundError: [Errno 2] No such file or directory: 'imported_data.py'.

I am a bit unsure where the "imported_data.py" file should come from.

SGenheden commented 2 weeks ago

This should be generated from the previous step. But I gave you the wrong standard name, it should be called imported_reactions.csv and it should be available in your folder.

LyuboKotop commented 2 weeks ago

Still cannot find this file:

FileNotFoundError: [Errno 2] No such file or directory: 'imported_reactions.csv'

image

SGenheden commented 2 weeks ago

Are you running the pipeline from this folder? When you ran python -m aizynthtrain.pipelines.template_pipeline did you run it in this folder? That pipeline should have taken your custom data and imported into a format that is compatible with the rest of the pipeline. And this import should have created a csv file that is by default called imported_reactions.csv. So if this file is not created, I am questioning how the pipeline could be run at all, and how you are importing your custom data. Please provide more details.

LyuboKotop commented 2 weeks ago

I run the pipeline inside the folder that contains the template_pipeline_config file is (aizynthtrain/configs/uspto) (it is not this folder).

LyuboKotop commented 2 weeks ago

I tried running the pipeline in the folder you suggested (aizynthtrain/pipelines/data but I got error since the template_pipeline_config.yml file was not in this folder. Therefore, I copied the config file in the folder and reran the pipeline.

This time the pipeline ran for longer than usual but this time led to a new error:

2024-08-28 16:39:26.210 [1724863064098243/reaction_selection/205 (pid 217306)] failed: 2024-08-28 16:39:26.217 [1724863064098243/reaction_selection/205 (pid 217306)] Internal error 2024-08-28 16:39:26.218 [1724863064098243/reaction_selection/205 (pid 217306)] Traceback (most recent call last): 2024-08-28 16:39:26.218 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1057, in main 2024-08-28 16:39:26.219 [1724863064098243/reaction_selection/205 (pid 217306)] start(auto_envvar_prefix="METAFLOW", obj=state) 2024-08-28 16:39:26.219 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/init.py", line 27, in wrapper_func 2024-08-28 16:39:26.219 [1724863064098243/reaction_selection/205 (pid 217306)] return func(args, kwargs) 2024-08-28 16:39:26.219 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2024-08-28 16:39:26.220 [1724863064098243/reaction_selection/205 (pid 217306)] return self.main(args, kwargs) 2024-08-28 16:39:26.220 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2024-08-28 16:39:26.220 [1724863064098243/reaction_selection/205 (pid 217306)] rv = self.invoke(ctx) 2024-08-28 16:39:26.220 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2024-08-28 16:39:26.220 [1724863064098243/reaction_selection/205 (pid 217306)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-08-28 16:39:26.221 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2024-08-28 16:39:26.221 [1724863064098243/reaction_selection/205 (pid 217306)] return ctx.invoke(self.callback, ctx.params) 2024-08-28 16:39:26.221 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2024-08-28 16:39:26.221 [1724863064098243/reaction_selection/205 (pid 217306)] return callback(args, kwargs) 2024-08-28 16:39:26.876 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2024-08-28 16:39:26.876 [1724863064098243/reaction_selection/205 (pid 217306)] return f(get_current_context(), args, kwargs) 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 452, in step 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] task.run_step( 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 608, in run_step 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] self._exec_step_function(step_func) 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 59, in _exec_step_function 2024-08-28 16:39:26.877 [1724863064098243/reaction_selection/205 (pid 217306)] step_function() 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/template_pipeline.py", line 97, in reaction_selection 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] selection_runner( 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/aizynthtrain/aizynthtrain/utils/reporting.py", line 57, in main 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] create_html_report_from_notebook( 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/aizynthtrain/aizynthtrain/utils/reporting.py", line 29, in create_html_report_from_notebook 2024-08-28 16:39:26.878 [1724863064098243/reaction_selection/205 (pid 217306)] papermill.execute_notebook( 2024-08-28 16:39:26.879 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/papermill/execute.py", line 131, in execute_notebook 2024-08-28 16:39:26.879 [1724863064098243/reaction_selection/205 (pid 217306)] raise_for_execution_errors(nb, output_path) 2024-08-28 16:39:26.879 [1724863064098243/reaction_selection/205 (pid 217306)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/papermill/execute.py", line 251, in raise_for_execution_errors 2024-08-28 16:39:26.879 [1724863064098243/reaction_selection/205 (pid 217306)] raise error 2024-08-28 16:39:26.879 [1724863064098243/reaction_selection/205 (pid 217306)] papermill.exceptions.PapermillExecutionError: 2024-08-28 16:39:26.880 [1724863064098243/reaction_selection/205 (pid 217306)] --------------------------------------------------------------------------- 2024-08-28 16:39:26.880 [1724863064098243/reaction_selection/205 (pid 217306)] Exception encountered at "In [14]": 2024-08-28 16:39:26.880 [1724863064098243/reaction_selection/205 (pid 217306)] --------------------------------------------------------------------------- 2024-08-28 16:39:26.880 [1724863064098243/reaction_selection/205 (pid 217306)] AttributeError Traceback (most recent call last) 2024-08-28 16:39:26.880 [1724863064098243/reaction_selection/205 (pid 217306)] Cell In[14], line 1 2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] ----> 1 info = data["id"].str.extract(r"_P(?P\d)$", expand=False) 2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] 2 prod_val = info[~info.isna()].astype(int) 2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] 3 sib_sel = np.zeros(len(data)).astype("bool") 2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] File ~/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/core/generic.py:5902, in NDFrame.getattr(self, name) 2024-08-28 16:39:26.882 [1724863064098243/reaction_selection/205 (pid 217306)] 5895 if ( 2024-08-28 16:39:26.882 [1724863064098243/reaction_selection/205 (pid 217306)] 5896 name not in self._internal_names_set 2024-08-28 16:39:26.882 [1724863064098243/reaction_selection/205 (pid 217306)] 5897 and name not in self._metadata 2024-08-28 16:39:26.882 [1724863064098243/reaction_selection/205 (pid 217306)] 5898 and name not in self._accessors 2024-08-28 16:39:26.882 [1724863064098243/reaction_selection/205 (pid 217306)] 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 2024-08-28 16:39:26.883 [1724863064098243/reaction_selection/205 (pid 217306)] 5900 ): 2024-08-28 16:39:26.883 [1724863064098243/reaction_selection/205 (pid 217306)] 5901 return self[name] 2024-08-28 16:39:26.883 [1724863064098243/reaction_selection/205 (pid 217306)] -> 5902 return object.getattribute(self, name) 2024-08-28 16:39:26.883 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.883 [1724863064098243/reaction_selection/205 (pid 217306)] File ~/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/core/accessor.py:182, in CachedAccessor.get(self, obj, cls) 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] 179 if obj is None: 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] 180 # we're accessing the attribute of the class, i.e., Dataset.geo 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] 181 return self._accessor 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] --> 182 accessor_obj = self._accessor(obj) 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] 183 # Replace the property with the accessor object. Inspired by: 2024-08-28 16:39:26.884 [1724863064098243/reaction_selection/205 (pid 217306)] 184 # https://www.pydanny.com/cached-property.html 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] 185 # We need to use object.setattr because we overwrite setattr on 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] 186 # NDFrame 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] 187 object.setattr(obj, self._name, accessor_obj) 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] File ~/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/core/strings/accessor.py:181, in StringMethods.init(self, data) 2024-08-28 16:39:26.885 [1724863064098243/reaction_selection/205 (pid 217306)] 178 def init(self, data) -> None: 2024-08-28 16:39:26.886 [1724863064098243/reactionselection/205 (pid 217306)] 179 from pandas.core.arrays.string import StringDtype 2024-08-28 16:39:26.886 [1724863064098243/reaction_selection/205 (pid 217306)] --> 181 self._inferred_dtype = self._validate(data) 2024-08-28 16:39:26.886 [1724863064098243/reaction_selection/205 (pid 217306)] 182 self._is_categorical = is_categorical_dtype(data.dtype) 2024-08-28 16:39:26.886 [1724863064098243/reaction_selection/205 (pid 217306)] 183 self._is_string = isinstance(data.dtype, StringDtype) 2024-08-28 16:39:26.886 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.886 [1724863064098243/reaction_selection/205 (pid 217306)] File ~/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/core/strings/accessor.py:235, in StringMethods._validate(data) 2024-08-28 16:39:26.887 [1724863064098243/reaction_selection/205 (pid 217306)] 232 inferred_dtype = lib.infer_dtype(values, skipna=True) 2024-08-28 16:39:26.887 [1724863064098243/reaction_selection/205 (pid 217306)] 234 if inferred_dtype not in allowed_types: 2024-08-28 16:39:26.887 [1724863064098243/reaction_selection/205 (pid 217306)] --> 235 raise AttributeError("Can only use .str accessor with string values!") 2024-08-28 16:39:26.887 [1724863064098243/reaction_selection/205 (pid 217306)] 236 return inferred_dtype 2024-08-28 16:39:26.887 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.888 [1724863064098243/reaction_selection/205 (pid 217306)] AttributeError: Can only use .str accessor with string values! 2024-08-28 16:39:26.888 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.888 [1724863064098243/reaction_selection/205 (pid 217306)] 2024-08-28 16:39:26.889 [1724863064098243/reaction_selection/205 (pid 217306)] Task failed.

SGenheden commented 2 weeks ago

Ok. This error is easier to understand:

It comes from trying to execute this line

2024-08-28 16:39:26.881 [1724863064098243/reaction_selection/205 (pid 217306)] ----> 1 info = data["id"].str.extract(r"_P(?P<product_no>\d)$", expand=False)

The code assumes that all of your IDs in the ID-column are strings. If they are not, this will lead to an error.

I am curious to see that you earlier error appears to have vanished when you re-run the pipeline.

My intention with the previous comment was that you should not execute the full template extraction-pipeline but rather run python -m rxnutils.pipeline.runner to check an individual step in the pipeline.

LyuboKotop commented 2 weeks ago

I ran the rxnutils.pipeline.runner in the aizynthtrain/pipelines/data folder.

I originally ran the full template extraction-pipeline in the aizynthtrain/configs/uspto folder that led to the original error (validated_reactions.csv), while now when I run the full template pipeline in the aizynthtrain/pipelines/data folder I get the ID error.

LyuboKotop commented 2 weeks ago

Ok, so I changed my values in the "ID" column to be strings and this got rid of the str error and the pipeline was able to run a bit further (I run it inside the aizynthtrain/pipelines/data folder).

A new error has occurred now:

2024-08-29 10:25:55.398 [1724927051903642/template_extraction_join/242 (pid 311757)] failed: 2024-08-29 10:25:55.404 [1724927051903642/template_extraction_join/242 (pid 311757)] Internal error 2024-08-29 10:25:55.405 [1724927051903642/template_extraction_join/242 (pid 311757)] Traceback (most recent call last): 2024-08-29 10:25:55.405 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1057, in main 2024-08-29 10:25:55.405 [1724927051903642/template_extraction_join/242 (pid 311757)] start(auto_envvar_prefix="METAFLOW", obj=state) 2024-08-29 10:25:55.406 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/init.py", line 27, in wrapper_func 2024-08-29 10:25:55.406 [1724927051903642/template_extraction_join/242 (pid 311757)] return func(args, kwargs) 2024-08-29 10:25:55.407 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2024-08-29 10:25:55.407 [1724927051903642/template_extraction_join/242 (pid 311757)] return self.main(args, kwargs) 2024-08-29 10:25:55.801 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2024-08-29 10:25:55.801 [1724927051903642/template_extraction_join/242 (pid 311757)] rv = self.invoke(ctx) 2024-08-29 10:25:55.801 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2024-08-29 10:25:55.802 [1724927051903642/template_extraction_join/242 (pid 311757)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-08-29 10:25:55.802 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2024-08-29 10:25:55.802 [1724927051903642/template_extraction_join/242 (pid 311757)] return ctx.invoke(self.callback, ctx.params) 2024-08-29 10:25:55.802 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2024-08-29 10:25:55.803 [1724927051903642/template_extraction_join/242 (pid 311757)] return callback(args, kwargs) 2024-08-29 10:25:55.803 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2024-08-29 10:25:55.803 [1724927051903642/template_extraction_join/242 (pid 311757)] return f(get_current_context(), args, kwargs) 2024-08-29 10:25:55.803 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 452, in step 2024-08-29 10:25:55.804 [1724927051903642/template_extraction_join/242 (pid 311757)] task.run_step( 2024-08-29 10:25:55.804 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 606, in run_step 2024-08-29 10:25:55.804 [1724927051903642/template_extraction_join/242 (pid 311757)] self._exec_step_function(step_func, input_obj) 2024-08-29 10:25:55.804 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 61, in _exec_step_function 2024-08-29 10:25:55.805 [1724927051903642/template_extraction_join/242 (pid 311757)] step_function(input_obj) 2024-08-29 10:25:55.805 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/template_pipeline.py", line 154, in template_extraction_join 2024-08-29 10:25:55.805 [1724927051903642/template_extraction_join/242 (pid 311757)] self._combine_batches(self.config.unvalidated_templates_path) 2024-08-29 10:25:55.805 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/template_pipeline.py", line 241, in _combine_batches 2024-08-29 10:25:55.806 [1724927051903642/template_extraction_join/242 (pid 311757)] combine_csv_batches(filename, self.config.nbatches) 2024-08-29 10:25:55.806 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/rxnutils/data/batch_utils.py", line 66, in combine_csv_batches 2024-08-29 10:25:55.806 [1724927051903642/template_extraction_join/242 (pid 311757)] combine_batches(filename, nbatches, _read_csv, _write_csv, _combine_csv) 2024-08-29 10:25:55.806 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/rxnutils/data/batch_utils.py", line 29, in combine_batches 2024-08-29 10:25:55.807 [1724927051903642/template_extraction_join/242 (pid 311757)] temp_data, filename2 = read_func(filename, idx) 2024-08-29 10:25:55.807 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/rxnutils/data/batch_utils.py", line 54, in _read_csv 2024-08-29 10:25:55.807 [1724927051903642/template_extraction_join/242 (pid 311757)] return pd.read_csv(filename2, sep="\t"), filename2 2024-08-29 10:25:55.807 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper 2024-08-29 10:25:55.808 [1724927051903642/template_extraction_join/242 (pid 311757)] return func(args, kwargs) 2024-08-29 10:25:55.808 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper 2024-08-29 10:25:55.808 [1724927051903642/template_extraction_join/242 (pid 311757)] return func(args, kwargs) 2024-08-29 10:25:55.808 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv 2024-08-29 10:25:55.809 [1724927051903642/template_extraction_join/242 (pid 311757)] return _read(filepath_or_buffer, kwds) 2024-08-29 10:25:55.809 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read 2024-08-29 10:25:55.809 [1724927051903642/template_extraction_join/242 (pid 311757)] parser = TextFileReader(filepath_or_buffer, kwds) 2024-08-29 10:25:55.809 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in init 2024-08-29 10:25:55.810 [1724927051903642/template_extraction_join/242 (pid 311757)] self._engine = self._make_engine(f, self.engine) 2024-08-29 10:25:55.810 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine 2024-08-29 10:25:55.810 [1724927051903642/template_extraction_join/242 (pid 311757)] self.handles = get_handle( 2024-08-29 10:25:55.810 [1724927051903642/template_extraction_join/242 (pid 311757)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/common.py", line 856, in get_handle 2024-08-29 10:25:55.811 [1724927051903642/template_extraction_join/242 (pid 311757)] handle = open( 2024-08-29 10:25:55.811 [1724927051903642/template_extraction_join/242 (pid 311757)] FileNotFoundError: [Errno 2] No such file or directory: 'reaction_templates_unvalidated.csv.35'

It seems that a new csv is missing now.

SGenheden commented 2 weeks ago

So what happens if you run the suggest command

python -m rxnutils.pipeline.runner --pipeline aizynthtrain/pipelines/data/reaction_validation_pipeline.yaml --data imported_reactions.py --output temp.csv --max-workers 1 --batch START END --no-intermediates

In the same folder?

LyuboKotop commented 2 weeks ago

(aizynthtrain) kotop@DESKTOP-S2DEI0D:~/aizynthtrain/aizynthtrain/pipelines/data$ python -m rxnutils.pipeline.runner --pipeline reaction_validation_pipeline.yaml --data imported_reactions.csv --output temp.csv --max-workers 1 --batch 38 39 --no-intermediates Running isotope_info (extract and remove isotope information from reactions) Running remove_unsanitizable (removing molecules that is not sanitizable by RDKit) Running reagents2reactants (putting all reagents to reactants) Running reactants2reagents (putting all non-reacting reactants as reagents) Running remove_extra_atom_mapping (removing atom maps in reactants and reagents not in products) Running neutralize_molecules (neutralize molecules using RDKit neutralizer) Running remove_unsanitizable (removing molecules that is not sanitizable by RDKit) Running remove_unchanged_products (Remove unchanged products) Running count_components (counting reactants, reagents, products and mapped versions of these) Running pseudo_reaction_hash (calculate hash based on InChI key of components) Running count_elements (calculate the occurence of elements in the reactants) Running productsize (number of heavy atoms in product) Running product_atommapping_stats (count number of number of unmapped and widow product atoms) Running hasunmappedradicalatom (detect if there is an unmapped radical in the reaction SMILES) Running unsanitizablereactants (detect if there is unsanitizable reactants) Running maxrings (maximum number of rings) Running ringnumberchange (ring change based on number of rings) Running ringbondmade (ring change based on ring bond made) Running ringmadesize (largest ring made) Running cgr_created (flag if a CGR can be created for the reaction) Running cgr_dynamic_bonds (number of dynamic bonds in the CGR)

Seems to be working fine. This is for START END 38 39.

SGenheden commented 2 weeks ago

It is a bit worrying that it failed on another batch this time. I also see that you have very small batches. How many data points do you have?

I remember that we had some issues with if the number of data points where low compared to the number of batches requsted, and i thought we fixed this. But perhaps worth trying with setting the number of batches to something like 50. You can do that in the yaml-file you are providing the pipeline:

file_prefix: SOMETHING
nbatches: 50
LyuboKotop commented 2 weeks ago

So my dataset consists of about 1000 reactions. Maybe the dataset is too small?

SGenheden commented 2 weeks ago

It should work. But let's try with 20 or 50 batches. The default of 200 comes from my dataset size which is on the order of millions.

LyuboKotop commented 2 weeks ago

Ok, 50 batches did not work. But it seems that 20 batches did the trick:

2024-08-29 13:10:52.661 [1724937000866286/template_validation_join/69 (pid 323145)] RxnSmilesClean ... TemplateGivesOtherReactants 2024-08-29 13:10:52.662 [1724937000866286/template_validation_join/69 (pid 323145)] 0 [O:1]=[C:2]([N:3]1[CH2:4][CH2:5][C:6]2([CH2:7]... ... False 2024-08-29 13:10:52.662 [1724937000866286/template_validation_join/69 (pid 323145)] 1 [CH3:1][CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][c:1... ... False 2024-08-29 13:10:52.663 [1724937000866286/template_validation_join/69 (pid 323145)] 0 [CH3:1]N:2[CH2:4][CH2:5][CH2:6][C:7... ... False 2024-08-29 13:10:52.663 [1724937000866286/template_validation_join/69 (pid 323145)] 1 [CH3:1]C:2[O:4][c:5]1[cH:6][cH:7][cH... ... False 2024-08-29 13:10:52.664 [1724937000866286/template_validation_join/69 (pid 323145)] 0 [O:1]=[C:2]([NH:3][CH2:4][CH2:5][CH2:6][c:17]1... ... False 2024-08-29 13:10:52.664 [1724937000866286/template_validation_join/69 (pid 323145)] 2024-08-29 13:10:52.664 [1724937000866286/template_validation_join/69 (pid 323145)] [5 rows x 13 columns] 2024-08-29 13:10:52.665 [1724937000866286/template_validation_join/69 (pid 323145)] LYUBOMIR: Successfully wrote batch file: reaction_templates_validated.csv 2024-08-29 13:10:53.085 [1724937000866286/template_validation_join/69 (pid 323145)] LYUBOMIR: Going out of combine_csv 2024-08-29 14:10:53.086 [1724937000866286/template_validation_join/69 (pid 323145)] Task finished successfully. 2024-08-29 14:10:53.091 [1724937000866286/template_selection/70 (pid 323211)] Task is starting. Executing: 100%|██████████| 16/16 [00:03<00:00, 4.46cell/s]/70 (pid 323211)] Executing: 0%| | 0/16 [00:00<?, ?cell/s] 2024-08-29 14:11:00.249 [1724937000866286/template_selection/70 (pid 323211)] Task finished successfully. 2024-08-29 14:11:00.254 [1724937000866286/end/71 (pid 323333)] Task is starting. 2024-08-29 13:11:02.759 [1724937000866286/end/71 (pid 323333)] Report on extracted reaction is located here: reaction_selection_report.html 2024-08-29 13:11:03.146 [1724937000866286/end/71 (pid 323333)] Report on extracted templates is located here: template_selection_report.html 2024-08-29 14:11:03.148 [1724937000866286/end/71 (pid 323333)] Task finished successfully. 2024-08-29 14:11:03.148 Done!

SGenheden commented 2 weeks ago

Ok. Annoying error, hard to debug. How many reactions were left from the selection? And how many templates where produced in the end? Are these acceptable numbers?

LyuboKotop commented 2 weeks ago

Total number of extracted reactions = 950 Total number of extracted templates = 35 (1.00%)

SGenheden commented 2 weeks ago

This sounds alright to me. The percentage of extracted unique templates is about what we have for USPTO or our internal data.

LyuboKotop commented 2 weeks ago

Ok so it was just the nbatches (I ran it in the original folder and it worked so doesnt matter which folder as long as it has the config.yml).

The most annoying part is that I remember I tried to reduce the nbatches myself in the config file but maybe I didnt go as low as 20 :'(

LyuboKotop commented 2 weeks ago

Would I need to set nbatches to 20 for the expansion pipeline too?

LyuboKotop commented 2 weeks ago

I tried running the expansion model pipeline inside the folder where the template pipeline files were generated and got the following error:

2024-08-29 18:52:00.820 [1724957504598758/create_template_metadata/2 (pid 331309)] failed: 2024-08-29 18:52:00.827 [1724957504598758/create_template_metadata/2 (pid 331309)] Internal error 2024-08-29 18:52:00.828 [1724957504598758/create_template_metadata/2 (pid 331309)] Traceback (most recent call last): 2024-08-29 18:52:00.828 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1057, in main 2024-08-29 18:52:00.828 [1724957504598758/create_template_metadata/2 (pid 331309)] start(auto_envvar_prefix="METAFLOW", obj=state) 2024-08-29 18:52:00.828 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/init.py", line 27, in wrapper_func 2024-08-29 18:52:00.829 [1724957504598758/create_template_metadata/2 (pid 331309)] return func(args, kwargs) 2024-08-29 18:52:02.342 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2024-08-29 18:52:02.342 [1724957504598758/create_template_metadata/2 (pid 331309)] return self.main(args, kwargs) 2024-08-29 18:52:02.343 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2024-08-29 18:52:02.343 [1724957504598758/create_template_metadata/2 (pid 331309)] rv = self.invoke(ctx) 2024-08-29 18:52:02.343 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2024-08-29 18:52:02.343 [1724957504598758/create_template_metadata/2 (pid 331309)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-08-29 18:52:02.343 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2024-08-29 18:52:02.344 [1724957504598758/create_template_metadata/2 (pid 331309)] return ctx.invoke(self.callback, ctx.params) 2024-08-29 18:52:02.344 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2024-08-29 18:52:02.344 [1724957504598758/create_template_metadata/2 (pid 331309)] return callback(args, kwargs) 2024-08-29 18:52:02.344 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2024-08-29 18:52:02.344 [1724957504598758/create_template_metadata/2 (pid 331309)] return f(get_current_context(), args, kwargs) 2024-08-29 18:52:02.345 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 452, in step 2024-08-29 18:52:02.345 [1724957504598758/create_template_metadata/2 (pid 331309)] task.run_step( 2024-08-29 18:52:02.345 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 608, in run_step 2024-08-29 18:52:02.345 [1724957504598758/create_template_metadata/2 (pid 331309)] self._exec_step_function(step_func) 2024-08-29 18:52:02.345 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 59, in _exec_step_function 2024-08-29 18:52:02.346 [1724957504598758/create_template_metadata/2 (pid 331309)] step_function() 2024-08-29 18:52:02.346 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/expansion_model_pipeline.py", line 37, in create_template_metadata 2024-08-29 18:52:02.346 [1724957504598758/create_template_metadata/2 (pid 331309)] create_template_lib([self.config_path]) 2024-08-29 18:52:02.346 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/aizynthtrain/aizynthtrain/modelling/expansion_policy/create_template_lib.py", line 203, in main 2024-08-29 18:52:02.346 [1724957504598758/create_template_metadata/2 (pid 331309)] _save_split_indices(dataset, config) 2024-08-29 18:52:02.347 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/aizynthtrain/aizynthtrain/modelling/expansion_policy/create_template_lib.py", line 140, in _save_split_indices 2024-08-29 18:52:02.347 [1724957504598758/create_template_metadata/2 (pid 331309)] reaction_hashes = extract_route_reactions(config.routes_to_exclude) 2024-08-29 18:52:02.347 [1724957504598758/create_template_metadata/2 (pid 331309)] File "/home/kotop/aizynthtrain/aizynthtrain/utils/data_utils.py", line 29, in extract_route_reactions 2024-08-29 18:52:02.347 [1724957504598758/create_template_metadata/2 (pid 331309)] with open(filename, "r") as fileobj: 2024-08-29 18:52:02.347 [1724957504598758/create_template_metadata/2 (pid 331309)] FileNotFoundError: [Errno 2] No such file or directory: 'ref_routes_n1.json' 2024-08-29 18:52:02.348 [1724957504598758/create_template_metadata/2 (pid 331309)] 2024-08-29 18:52:02.348 [1724957504598758/create_template_metadata/2 (pid 331309)] Task failed. 2024-08-29 19:52:02.349 Workflow failed. 2024-08-29 19:52:02.349 Terminating 0 active tasks... 2024-08-29 19:52:02.349 Flushing logs... Step failure: Step create_template_metadata (task-id 2) failed.

LyuboKotop commented 2 weeks ago

I removed the:

routes_to_exclude:

Statement from the expansion model config yml file which led to another batch error, so I added the nbatches: 20 statement here too. After doing this, there was another error (FYI: I was able to generate the uspto_keras_model.hdf5 trained Keras model and the uspto_unique_templates.csv.gz template library for AiZynthFinder files despite the error):

2024-08-29 19:08:13.773 [1724958364809134/model_validation/28 (pid 344216)] Traceback (most recent call last): 2024-08-29 19:08:15.269 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/bin/aizynthcli", line 8, in 2024-08-29 19:08:15.270 [1724958364809134/model_validation/28 (pid 344216)] sys.exit(main()) 2024-08-29 19:08:15.270 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/interfaces/aizynthcli.py", line 375, in main 2024-08-29 19:08:15.270 [1724958364809134/model_validation/28 (pid 344216)] finder = AiZynthFinder(configfile=args.config) 2024-08-29 19:08:15.270 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/aizynthfinder.py", line 71, in init 2024-08-29 19:08:15.271 [1724958364809134/model_validation/28 (pid 344216)] self.config = Configuration.from_file(configfile) 2024-08-29 19:08:15.271 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/context/config.py", line 141, in from_file 2024-08-29 19:08:15.271 [1724958364809134/model_validation/28 (pid 344216)] return Configuration.from_dict(_config) 2024-08-29 19:08:15.271 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/context/config.py", line 112, in from_dict 2024-08-29 19:08:15.272 [1724958364809134/model_validation/28 (pid 344216)] config_obj.stock.load_from_config(stock_config) 2024-08-29 19:08:15.272 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/context/stock/stock.py", line 201, in load_from_config 2024-08-29 19:08:15.272 [1724958364809134/model_validation/28 (pid 344216)] obj = cls(kwargs) 2024-08-29 19:08:15.272 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/aizynthfinder/context/stock/queries.py", line 121, in init 2024-08-29 19:08:15.273 [1724958364809134/model_validation/28 (pid 344216)] stock_df: pd.DataFrame = pd.read_hdf(path, key="table") 2024-08-29 19:08:15.273 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/pytables.py", line 414, in read_hdf 2024-08-29 19:08:15.273 [1724958364809134/model_validation/28 (pid 344216)] raise FileNotFoundError(f"File {path_or_buf} does not exist") 2024-08-29 19:08:15.273 [1724958364809134/model_validation/28 (pid 344216)] FileNotFoundError: File stock_for_eval_find.hdf5 does not exist 2024-08-29 19:08:15.274 [1724958364809134/model_validation/28 (pid 344216)] failed: 2024-08-29 19:08:15.276 [1724958364809134/model_validation/28 (pid 344216)] Internal error 2024-08-29 19:08:15.277 [1724958364809134/model_validation/28 (pid 344216)] Traceback (most recent call last): 2024-08-29 19:08:15.277 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1057, in main 2024-08-29 19:08:15.277 [1724958364809134/model_validation/28 (pid 344216)] start(auto_envvar_prefix="METAFLOW", obj=state) 2024-08-29 19:08:15.278 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/init.py", line 27, in wrapper_func 2024-08-29 19:08:15.278 [1724958364809134/model_validation/28 (pid 344216)] return func(args, kwargs) 2024-08-29 19:08:16.788 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2024-08-29 19:08:16.789 [1724958364809134/model_validation/28 (pid 344216)] return self.main(args, kwargs) 2024-08-29 19:08:16.790 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2024-08-29 19:08:16.790 [1724958364809134/model_validation/28 (pid 344216)] rv = self.invoke(ctx) 2024-08-29 19:08:16.791 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2024-08-29 19:08:16.791 [1724958364809134/model_validation/28 (pid 344216)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-08-29 19:08:16.791 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2024-08-29 19:08:16.792 [1724958364809134/model_validation/28 (pid 344216)] return ctx.invoke(self.callback, ctx.params) 2024-08-29 19:08:16.792 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2024-08-29 19:08:16.792 [1724958364809134/model_validation/28 (pid 344216)] return callback(args, kwargs) 2024-08-29 19:08:16.793 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2024-08-29 19:08:16.793 [1724958364809134/model_validation/28 (pid 344216)] return f(get_current_context(), args, kwargs) 2024-08-29 19:08:16.794 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 452, in step 2024-08-29 19:08:16.794 [1724958364809134/model_validation/28 (pid 344216)] task.run_step( 2024-08-29 19:08:16.794 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 608, in run_step 2024-08-29 19:08:16.795 [1724958364809134/model_validation/28 (pid 344216)] self._exec_step_function(step_func) 2024-08-29 19:08:16.795 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 59, in _exec_step_function 2024-08-29 19:08:16.795 [1724958364809134/model_validation/28 (pid 344216)] step_function() 2024-08-29 19:08:16.796 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/expansion_model_pipeline.py", line 101, in model_validation 2024-08-29 19:08:16.796 [1724958364809134/model_validation/28 (pid 344216)] eval_multi_step([self.config_path]) 2024-08-29 19:08:16.796 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/aizynthtrain/aizynthtrain/modelling/expansion_policy/eval_multi_step.py", line 127, in main 2024-08-29 19:08:16.797 [1724958364809134/model_validation/28 (pid 344216)] stats = _eval_finding(config, finder_config_path) 2024-08-29 19:08:16.797 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/aizynthtrain/aizynthtrain/modelling/expansion_policy/eval_multi_step.py", line 40, in _eval_finding 2024-08-29 19:08:16.797 [1724958364809134/model_validation/28 (pid 344216)] finder_output = pd.read_hdf(output_path, "table") 2024-08-29 19:08:16.798 [1724958364809134/model_validation/28 (pid 344216)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/pandas/io/pytables.py", line 414, in read_hdf 2024-08-29 19:08:16.798 [1724958364809134/model_validation/28 (pid 344216)] raise FileNotFoundError(f"File {path_or_buf} does not exist") 2024-08-29 19:08:16.799 [1724958364809134/model_validation/28 (pid 344216)] FileNotFoundError: File uspto_model_validation_finder_output_finding.hdf5 does not exist 2024-08-29 19:08:16.799 [1724958364809134/model_validation/28 (pid 344216)] 2024-08-29 19:08:16.800 [1724958364809134/model_validation/28 (pid 344216)] Task failed. 2024-08-29 20:08:16.801 Workflow failed. 2024-08-29 20:08:16.801 Terminating 0 active tasks... 2024-08-29 20:08:16.801 Flushing logs... Step failure: Step model_validation (task-id 28) failed.

Maybe something to do with the model evaluation statements inside the config file (possibly stock_for_finding: stock_for_eval_find.hdf5)? I am not entirely sure what purpose those serve, maybe I do not need them for my particular dataset and should omit them?

SGenheden commented 2 weeks ago

Yes, if you do

expansion_model_evaluation:
  file_prefix: uspto
  stock_for_finding:
  target_smiles:
  stock_for_recovery:
  reference_routes:

it will not do the multistep evaluation.

Otherwise you can download the files from here: https://zenodo.org/records/7341155

LyuboKotop commented 1 week ago

2024-09-03 09:39:10.590 [1725356268009387/model_validation/9 (pid 349115)] failed: 2024-09-03 09:39:10.599 [1725356268009387/model_validation/9 (pid 349115)] Internal error 2024-09-03 09:39:10.600 [1725356268009387/model_validation/9 (pid 349115)] Traceback (most recent call last): 2024-09-03 09:39:10.600 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1057, in main 2024-09-03 09:39:10.601 [1725356268009387/model_validation/9 (pid 349115)] start(auto_envvar_prefix="METAFLOW", obj=state) 2024-09-03 09:39:10.601 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/init.py", line 27, in wrapper_func 2024-09-03 09:39:10.601 [1725356268009387/model_validation/9 (pid 349115)] return func(args, kwargs) 2024-09-03 09:39:12.129 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in call 2024-09-03 09:39:12.130 [1725356268009387/model_validation/9 (pid 349115)] return self.main(args, kwargs) 2024-09-03 09:39:12.130 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main 2024-09-03 09:39:12.130 [1725356268009387/model_validation/9 (pid 349115)] rv = self.invoke(ctx) 2024-09-03 09:39:12.130 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke 2024-09-03 09:39:12.131 [1725356268009387/model_validation/9 (pid 349115)] return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-09-03 09:39:12.131 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke 2024-09-03 09:39:12.131 [1725356268009387/model_validation/9 (pid 349115)] return ctx.invoke(self.callback, ctx.params) 2024-09-03 09:39:12.131 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke 2024-09-03 09:39:12.131 [1725356268009387/model_validation/9 (pid 349115)] return callback(args, kwargs) 2024-09-03 09:39:12.132 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func 2024-09-03 09:39:12.132 [1725356268009387/model_validation/9 (pid 349115)] return f(get_current_context(), args, kwargs) 2024-09-03 09:39:12.132 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 452, in step 2024-09-03 09:39:12.132 [1725356268009387/model_validation/9 (pid 349115)] task.run_step( 2024-09-03 09:39:12.133 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 608, in run_step 2024-09-03 09:39:12.133 [1725356268009387/model_validation/9 (pid 349115)] self._exec_step_function(step_func) 2024-09-03 09:39:12.133 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/miniconda3/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 59, in _exec_step_function 2024-09-03 09:39:12.133 [1725356268009387/model_validation/9 (pid 349115)] step_function() 2024-09-03 09:39:12.133 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/aizynthtrain/aizynthtrain/pipelines/expansion_model_pipeline.py", line 100, in model_validation 2024-09-03 09:39:12.134 [1725356268009387/model_validation/9 (pid 349115)] eval_one_step([self.config_path]) 2024-09-03 09:39:12.134 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/aizynthtrain/aizynthtrain/modelling/expansion_policy/eval_one_step.py", line 184, in main 2024-09-03 09:39:12.134 [1725356268009387/model_validation/9 (pid 349115)] config: ExpansionModelEvaluationConfig = load_config( 2024-09-03 09:39:12.134 [1725356268009387/model_validation/9 (pid 349115)] File "/home/kotop/aizynthtrain/aizynthtrain/utils/configs.py", line 29, in load_config 2024-09-03 09:39:12.134 [1725356268009387/modelvalidation/9 (pid 349115)] config = class(dict_.get(class_key, {})) 2024-09-03 09:39:12.135 [1725356268009387/model_validation/9 (pid 349115)] File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init 2024-09-03 09:39:12.135 [1725356268009387/model_validation/9 (pid 349115)] pydantic.error_wrappers.ValidationError: 4 validation errors for ExpansionModelEvaluationConfig 2024-09-03 09:39:12.135 [1725356268009387/model_validation/9 (pid 349115)] stock_for_finding 2024-09-03 09:39:12.135 [1725356268009387/model_validation/9 (pid 349115)] none is not an allowed value (type=type_error.none.not_allowed) 2024-09-03 09:39:12.136 [1725356268009387/model_validation/9 (pid 349115)] stock_for_recovery 2024-09-03 09:39:12.136 [1725356268009387/model_validation/9 (pid 349115)] none is not an allowed value (type=type_error.none.not_allowed) 2024-09-03 09:39:12.136 [1725356268009387/model_validation/9 (pid 349115)] reference_routes 2024-09-03 09:39:12.136 [1725356268009387/model_validation/9 (pid 349115)] none is not an allowed value (type=type_error.none.not_allowed) 2024-09-03 09:39:12.136 [1725356268009387/model_validation/9 (pid 349115)] target_smiles 2024-09-03 09:39:12.137 [1725356268009387/model_validation/9 (pid 349115)] none is not an allowed value (type=type_error.none.not_allowed) 2024-09-03 09:39:12.137 [1725356268009387/model_validation/9 (pid 349115)] 2024-09-03 09:39:12.137 [1725356268009387/model_validation/9 (pid 349115)] Task failed.

It seems that they can't be left blank. They can be completely omitted though.

LyuboKotop commented 1 week ago

I can't really find the evaluation files in the link you provided. Are they named differently? This is what I am after:

stock_for_finding: stock_for_eval_find.hdf5 target_smiles: smiles_for_eval.txt stock_for_recovery: stock_for_eval_recov.txt reference_routes: routes_for_eval.json

I only managed to find the ref_routes.

SGenheden commented 1 week ago

Ok. My bad - try to leave them as empty strings

expansion_model_evaluation:
  file_prefix: uspto
  stock_for_finding: ""
  target_smiles: ""
  stock_for_recovery: ""
  reference_routes: ""

or I have attached 3 of the files here routes_for_eval.json stock_for_eval_recov.txt smiles_for_eval.txt

The route_for_eval.json and stock_for_eval_recov.txt are subsets of PaRoutes

smiles_for_eval.txt is a subset of ChEMBL

The stock for finding is the ZINC stock that can be downloaded from here: https://figshare.com/articles/dataset/AiZynthFinder_a_fast_robust_and_flexible_open-source_software_for_retrosynthetic_planning/12334577?file=23086469

LyuboKotop commented 1 week ago

Ok thanks.

What would be the downside of leaving them as empty strings? I guess we wouldn't be able to evaluate the performance properly? Although, I believe that even if I use the files you provided, the evaluation will still not be relevant, since my model has been trained on a specific type of reactions which would probably not be present in your files anyway. I guess the only solution would be to prepare custom evaluation files myself with relevant reactions.

Also, as a conclusion for this thread, do you know why higher nbatches values lead to improper execution of the pipelines in terms of missing validated_reactions csv files?

SGenheden commented 1 week ago

Regarding evaluation: this was introduced as a quick way to assess your models, to sort of give you an indication When you do frequent retraining. If you are just doing a one-off training, I would suggest evaluating on a larger dataset ”manually”

Regarding the batch number issue: I dont have good answer now, We have to look into this.