ONSdigital / sml-python-small

Statistical Methods Library for Python Pandas methods used in SPP.
MIT License
10 stars 5 forks source link

SPP 10277 UAT failures item non response #70

Closed gibbardsteve closed 11 months ago

gibbardsteve commented 11 months ago

Synopsis

With the recent release candidate UAT testing for totals and components processing was impaired with scenarios where the component list included some or all empty values.

Checklist

Description

Selective Editing and Date Adjustment use Pandas NaN to imply empty cell. Totals and Components and Thousand Pounds processing has been tweak slightly to work in a similar way.

Specifically:

thousand_pounds.py clean_target_variables() - where a string of "NaN" is received or returned, treat as float('Nan') Return a string representation of numbers, but if the value is None or Nan then do not convert to string

example.py The csv example is updated to ensure empty cells are treated as nan do bring inline with pandas processing

totals_and_components.py clean_components_list() - where a string of "NaN" is received or returned, treat as float('Nan')

pandas_example.py run_all_csvs() - allow the processing of all csvs in a given directory. Any file that contains "output" in the name is ignored when finding a list of input data files. This function allows all UAT files provided by MQD to be run.

pandas_wrapper.py When a value in the input data is None treat as float('Nan')

example_test_data.csv Add thousand pounds examples where empty cells are present for target_variables

example_test_data.csv Add totals and components data to test empty cells in component list

Additional CSV files are added for totals and components and thousand pounds processing - these are the UAT files that are run by MQD

sanjeevz3009 commented 11 months ago

When the _pandasexample.py runs all the UAT tests for TPC & TCC it produces around 51 output files and these aren't being ignored, unless we want these output files in the repo now. As we are outputting a lot of files now, we can remove lines 10 to 17 in the gitignore file and replace them with *output.csv*, so it ignores anything that ends with output.csv =)

If going with the above solution to ignore the output files, we also need to update the TPC example.py line 110 output file name from output1.csv to output.csv.

gibbardsteve commented 11 months ago

When the _pandasexample.py runs all the UAT tests for TPC & TCC it produces around 51 output files and these aren't being ignored, unless we want these output files in the repo now. As we are outputting a lot of files now, we can remove lines 10 to 17 in the gitignore file and replace them with *output.csv*, so it ignores anything that ends with output.csv =)

If going with the above solution to ignore the output files, we also need to update the TPC example.py line 110 output file name from output1.csv to output.csv.

Good point - update gitignore and changed the name of the output file.