ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

initial sdmx 2.1 pipeline transform acceptance test #128

Open osamede20 opened 5 months ago

osamede20 commented 5 months ago

This is for SDMX 2.1

What is this

We have written some acceptance tests setup that allows for the specification of a feature that starts with some amount of files in a directory,

see this.

We need to create some acceptance tests for the inputs we are receiving that expand on this, to check that the data is being correctly generated.

What to do

Now that we have a transform, we need to update that example to something like this

(note - cpih is specified as the datasets are not known. It is required to just pick one)

  Scenario: CPIH - Pipeline runs without errors
    Given a temporary source directory of files
        | file                   |  fixture                           |
        | pipeline-config.json   |  pipeline_config_basic_valid.json  |
        | data.xml               |  data_sdmx_valid_1.sdmx            |
    And v1_data_ingress starts using the temporary source directory
    Then the pipeline should generate no errors
    And a read the csv output "data.csv"
    And the csv output should have "100" rows
    And the csv output has the columns
        | Column 1 | Column 2 | Column 3 |
    And I read the metadata output "/outputs/metadata.json"
    And the metadata should match "cpih-metadata-correct.json"

Note 1 - start with running the existing holding features and go from there.

Note 2 - the files inside the "fixture" column in the above are pulled from /features/fixtures/data-fixtures.zip. To add more choice for fixtures, unzip it -> add your file(s) - > re-zip it. this is to avoid adding massive amounts of files to the repo, by the same token try and use quite small examples if you can.

Note 3 - Currently, calling the "v1_data_ingress starts using the temporary source directory" step causes some files to be output (data.xml, data.csv and metadata.json). These should be output to a temporary directory, which should be deleted once the acceptance tests have finished running.

Acceptance Criteria