Closed jdddog closed 8 months ago
Attention: 32 lines
in your changes are missing coverage. Please review.
Comparison is base (
31bc1e2
) 94.51% compared to head (b260abd
) 94.17%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks Jamie. I like the change to the directory structure for the workflow. Just minor nit comments on the PR.
Perhaps we could change the schema and sql functions to something like this? Or perhaps just do a check if the workflow folder exists and explicitly use that.
def schema_folder(workflow_name: Optional[str] = None) -> str: """Return the path to the schema folder. :param workflow_name: Optional, name of the workflow. Only to be included if the schema for the workflow is in the directory academic_observatory_workflows.workflows.{workflow_name}.schema :return: the path. """ if workflow_name: module_path = f"academic_observatory_workflows.workflows.{workflow_name}.schema" assert os.path.exists( module_file_path(module_path) ), f"Workflow name {workflow_name} given but schema folder within the workflow path does not exist!" return module_file_path(module_path) return module_file_path("academic_observatory_workflows.database.schema")
Thanks @alexmassen-hane, I've made those changes and updated the functions in config.py: https://github.com/The-Academic-Observatory/academic-observatory-workflows/blob/0e73b20780b1587ee62046b386c951b5432191e5/academic_observatory_workflows/config.py
Also added a function in this PR that is required: https://github.com/The-Academic-Observatory/observatory-platform/pull/647
This PR refactors the OA Dashboard workflow to use SQL rather than Pandas to produce the data for the dashboard, as this is a much easier and more efficient way to create the datasets and it uses less memory. Pandas was consuming too much memory.
The workflow now operates like this:
I thought that putting the files associated with each workflow in the same place made it easier to see and find the SQL templates, schemas and tests used by the workflow. I created a module for the workflow, put the code for the workflow in this module and created folders for the SQL templates and BigQuery schemas. See below for an example. What do you think @alexmassen-hane and @keegansmith21?
Some of the helper functions such as
schema_folder
andsql_folder
need updating, your suggestions would be welcome.We would need to update the other workflows as well, although that could just be done over time.