Closed julianharty closed 1 month ago
data/source_code_hosting_platform_dfs
after running the script utils/initial_data_preparation.py
. Changed the argparse
flag to look for the df in the subfolder and the script works.Integrated data preparation automation into the src/github_repo_request_local.py
script to check for the existence of the input DataFrame before reading it. If the DataFrame doesn't exist, execute the utils/initial_data_preparation.py
script to perform data preparation.
Related to the PR #76
I have changed the logic back when no csv is found. There's a message log that asks for the script utils/initial_data_preparation.py
to run first.
Context
After various updates to the codebase to improve the processing there's a path that causes the local script to fail - when the
csv_file_path
doesn't exist.The error is reported as:
The relevant logic is (at commit hash https://github.com/commercetest/nlnet/commit/63d716accacc6801c635332a1c28ae88fef0efa2)
The final else statement in this code snippet reports an 'error' (which isn't necessarily an error from a user's perspective) and then the code continues but the Dataframe doesn't exist, hence the program exits with the runtime error.
As this script should be able to run when none of the intermediate/working files exist (assuming
python utils/initial_data_preparation.py
has been run (which it has been) let's enhance the final else so that it creates a suitable dataframe, presumably usingdata/original.csv
that was created by the initial data preparation script.