WikiNewZealand / fundamental-figures

0 stars 0 forks source link

Order parsing/not enough columns gives weird results for property transfers #29

Closed natdudley closed 5 years ago

natdudley commented 5 years ago

The property transfers config is unusually complicated. The spreadsheet includes data on both buyers and sellers, but we're only interested in buyers for this project.

However, the spreadsheet has so many columns that I ran out of ones to include under category/measure/group, as I need the measure, property type, and category.

I was hoping to rely on order of parsing, in that it would only select the first matching value. This broke when it came to corporate only purchases in auckland. it also broke on tax status for only nz tax residents and Mixture of NZ tax residents and non-residents

image

{
            "uri": "https://figure.nz/table/m0pJErmnoSZbfz9k/download",
            "parent": "Households",
            "discriminator": "Territorial authority",
            "measure": {
                "column": "Measure",
                "group": {
                    "column": "Property type",
                    "separator": "—",
                    "include": [ ],
                    "exclude": [ ]
                },
                "include": [ 
                    {
                        "value": "Number of transfers",
                        "label": "Residency/Visa status of property buyers"
                    }
                ],
                "exclude": [ ]
            },
            "category": {
                "column": "Category",
                "include": [ 
                    {
                        "value": "At least one NZ citizen"
                    },
                    {
                        "value": "At least one NZ resident visa (but no citizens)"
                    },
                    {
                        "value": "No NZ citizens or resident visas"
                    },
                    {
                        "value": "Corporate only"
                    },
                    {
                        "value": "Affiliation unknown",
                        "label": "Unknown status"
                    }
                ],
                "exclude": [ ]
            },
            "date": "Year ended June"
        },
        {
            "uri": "https://figure.nz/table/m0pJErmnoSZbfz9k/download",
            "parent": "Households",
            "discriminator": "Territorial authority",
            "measure": {
                "column": "Measure",
                "group": {
                    "column": "Property type",
                    "separator": "—",
                    "include": [ ],
                    "exclude": [ 
                        "All property"
                    ]
                },
                "include": [ 
                    {
                        "value": "Number of transfers",
                        "label": "Tax residency of property buyers"
                    }
                ],
                "exclude": [ ]
            },
            "category": {
                "column": "Category",
                "include": [ 
                    {
                        "value": "Only NZ tax residents"
                    },
                    {
                        "value": "Mixture of NZ tax residents and non-residents"
                    },
                    {
                        "value": "Only NZ tax non-residents"
                    },
                    {
                        "value": "All parties exempt from stating tax residency"
                    },
                    {
                        "value": "No tax statement due to contract date"
                    }
                ],
                "exclude": [ ]
            },
            "date": "Year ended June"
        },
Spksh commented 5 years ago

We've got some nonconsecutive dates in the input .csv, which will be affecting order: image

I don't think we want to rely on the order of rows in the input .csv anyway; too much opportunity for variance.

natdudley commented 5 years ago

To fix related to this:

Spksh commented 5 years ago

So, yeah. Extremely large refactor here. We now support a list of columns for Measure and Category. See readme.