ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

bug - fix relative path use in schema lookup #61

Closed mikeAdamss closed 5 months ago

mikeAdamss commented 5 months ago

What is this

When installed as a python package our logic for finding schemas is failing.

I've included an example of the problem.

What to do

If you were to intall dpytools (see the readme) then run this code

from pathlib import Path

from dpypelines.pipeline.shared.schemas import get_config_schema_path

config_thats_wrong_but_good_enough_for_this_example = {
    "$id": "https://raw.githubusercontent.com/ONSdigital/dp-data-pipelines/sandbox/schemas/dataset-ingress/config/v1.json"
}

print(f"Current path: {Path(__file__).absolute()}")

config_path = get_config_schema_path(config_thats_wrong_but_good_enough_for_this_example)
print(config_path.absolute())

you'll get an error because we're looking in the wrong place.

I think (you'll need to investigate) its just because we're assuming a static and relative schema base path, see here: https://github.com/ONSdigital/dp-data-pipelines/blob/0682f4e8ca72fc5309bbd032c56a85acceeedb92/dpypelines/pipeline/shared/schemas.py#L11

this task is to correct is so we're always looking for the schema base path relative to where dpypelines is installed.

a useful thing

Path(__file__).parent gets you to the path to the directory containing whatever python file this code snippe is written in.

You should be able to use that (plus a bit more path joining) to always point to the schema directory regardless of where on your os dpypelines is installed. There might be a neater way, inviestgate.

Acceptance Criteria

mikeAdamss commented 5 months ago

done