floledermann / mapmap.js

A data-driven API for interactive thematic maps
GNU Affero General Public License v3.0
113 stars 12 forks source link

rowFileHandler too eager when converting to number #10

Closed wahlatlas closed 9 years ago

wahlatlas commented 9 years ago

It's a nice helper that mapmap.js tries to convert data loaded from .csv to numbers

    if (!isNaN(+row[key])) { // in JavaScript, NaN !== NaN !!!
                        // convert to number if number
                        row[key] = +row[key];

However this leads to unintended results as the primary key in german adminstrative maps contains leading zeros, e.g. Hamburg has the code 02 which rowFileHandler converts to 2 thus preventing the data to match the shapefile (this holds true for most municipalities in West-Germany).

See the Wikipedia article on municipal keys in Germany https://en.wikipedia.org/wiki/Community_Identification_Number#Structure_2

floledermann commented 9 years ago

For now this is fixed through an additional accessor option that can be passed to .data() to specify the row accessor function like in d3.CSV.parse(). If you don't specify this option, the current behaviour (autoconverting numbers) is still applied, but you can disable it by passing null for the accessor or replace it by a custom accessor function, e.g.

map.data('places.csv', { map: 'code', accessor: null}

In the future this should be solved more flexible by providing a way to compose map functions from multiple components, as this functionality overlaps with what is provided by mapReduce and it is not clear the the accessor function is called first.

I created a separate issue for this in the datadata library: https://github.com/floledermann/datadata.js/issues/4

This fix is committed and contained in the built version in the master branch.

wahlatlas commented 9 years ago

You obviously have the greater plan in mind but may I ask why checking for leading zeros would be a bad idea? Except for the primary key I appreciate everything being converted to number the current way. Even the Vatican has leading zeros in its post-code.

floledermann commented 9 years ago

I am not sure if it is a good idea to implement this in the default accessor - this would lead to having mixed value types in the postcode field, with postcodes with leading zero as strings and others converted to numbers. And currently we covert row-by-row, so there is no way of not converting any number if any row has a leading zero etc. You can easily implement this now according to your specific needs, either by following the approach in the d3.CSV documentation (specifying a conversion for each field), or creating your own "smart" converter that ignores a field name like so:

function(d) {
    var keys = Object.keys(d);
    for (var i=0; i<keys.length; i++) {
        var key = keys[i];
        // convert to number if it looks like a number
        if (key != "postcode" && !isNaN(+d[key])) {
            d[key] = +d[key];
        }
    }
    return d;
}

Clearly, in the future some utility functions should be provided in datadata.js to be able to compose converters like this in a comfortable manner, with an API like:

accessor = dd.smartRowConverter({ignore:"postcode"});
wahlatlas commented 9 years ago

Thanks for taking the time to elaborate, that all makes much sense and the comfortable API you suggest looks promising.