ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

initial pipeline transform acceptance test #84

Open mikeAdamss opened 5 months ago

mikeAdamss commented 5 months ago

this is a knowldge transfer task and should be done by both a DE and an SE.

What is this

We have written some accptance tests setup that allows you to specify a feature that starts with some amount of files in a directory,

see this.

We need to create some acceptance tests for the inputs we're reciveing that expand on this to check that the data is being correctly generated.

What to do

Now we have a transform We need to update that example to something like this

(note - im saying cpih as I dont know the datasets, just pick one)

  Scenario: CPIH - Pipeline runs without errors
    Given a temporary source directory of files
        | file                   |  fixture                           |
        | pipeline-config.json   |  pipeline_config_basic_valid.json  |
        | data.xml               |  data_sdmx_valid_1.sdmx            |
    And v1_data_ingress starts using the temporary source directory
    Then the pipeline should generate no errors
    And a read the csv output "data.csv"
    And the csv output should have "100" rows
    And the csv output has the columns
        | Column 1 | Column 2 | Column 3 |
    And I read the metadata output "/outputs/metadata.json"
    And the metadata should match "cpih-metadata-correct.json"

Note 1 - start with running the existing holding features and go from there.

Note 2 - the files inside the "fixture" column in the above are pulled from /features/fixtures/data-fixtures.zip. To add more choice for fixtures, unzip it -> add your file(s) - > rezip it. this is to avoid adding massive amoutns of files to the repo, by the same token try and use quite small examples if you can.

Note 3 - Currently, calling the "v1_data_ingress starts using the temporary source directory" step causes some files to be output (data.xml, data.csv and metadata.json). These should be output to a temporary directory, which should be deleted once the acceptance tests have finished running.

Acceptance Criteria