frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
191 stars 43 forks source link

Reading a resource with a number type and leading zero results in None #255

Closed willu47 closed 4 years ago

willu47 commented 4 years ago

Overview

This bug results in a None value when reading a resource in a tabular data package of type number when the value includes one leading zero e.g. 0.05 and the correct decimal value when prefaced with another zero or +. For example:

To replicate this bug, please run the following code:

from datapackage import Package
pack = Package('https://raw.githubusercontent.com/OSeMOSYS/simplicity/f0b50594360a92c4a9c70dd0465a3f8751f090a2/datapackage.json')
pack.get_resource('DiscountRate').read()

[['SIMPLICITY', None]]

Then, inspect https://github.com/OSeMOSYS/simplicity/blob/f0b50594360a92c4a9c70dd0465a3f8751f090a2/data/DiscountRate.csv and observe data in 0.05 format.

If you copy this file locally, and change the values of the file in data/DiscountRate.csv you'll see this bug in action!


Please preserve this line to notify @roll (lead of this repository)

roll commented 4 years ago

@willu47 Thanks for the report!

roll commented 4 years ago

Sorry for the late reply but it seems 0.05 was mistakenly added to missingValues:

https://raw.githubusercontent.com/OSeMOSYS/simplicity/f0b50594360a92c4a9c70dd0465a3f8751f090a2/datapackage.json

{
            "path": "data/DiscountRate.csv",
            "profile": "tabular-data-resource",
            "name": "DiscountRate",
            "format": "csv",
            "mediatype": "text/csv",
            "encoding": "utf-8",
            "schema": {
                "fields": [
                    {
                        "name": "REGION",
                        "type": "string",
                        "format": "default"
                    },
                    {
                        "name": "VALUE",
                        "type": "number",
                        "format": "default"
                    }
                ],
                "missingValues": [
                    "0.05"
                ],
                "foreignKeys": [
                    {
                        "fields": "REGION",
                        "reference": {
                            "resource": "REGION",
                            "fields": "VALUE"
                        }
                    }
                ],
                "primaryKey": [
                    "REGION"
                ]
            }
        },

It's the reason why it's None in the output data