frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
191 stars 43 forks source link

package.commit() adds escapeChar to dialect #200

Closed dahlbaek closed 6 years ago

dahlbaek commented 6 years ago

First of all, thank you for this fantastic set of tools for sharing data!

I think I found a minor bug. The following script

import pandas as pd
from datapackage import DataPackage

data = pd.DataFrame({"a": [1, 2, 3]})
data.to_csv("data.csv", index=False)

package = DataPackage()
package.infer("data.csv")
package.descriptor["resources"][0]["dialect"] = {"delimiter": ";"}
package.commit()
package.save("datapackage.json")

produces the following datapackage.json file:

{
    "resources": [
        {
            "schema": {
                "fields": [
                    {
                        "name": "a",
                        "type": "integer",
                        "format": "default"
                    }
                ],
                "missingValues": [
                    ""
                ]
            },
            "name": "data",
            "dialect": {
                "skipInitialSpace": true,
                "lineTerminator": "\r\n",
                "doubleQuote": true,
                "escapeChar": "\\",
                "delimiter": ";",
                "header": true,
                "caseSensitiveHeader": false,
                "quoteChar": "\""
            },
            "path": "data.csv",
            "encoding": "utf-8",
            "profile": "tabular-data-resource",
            "mediatype": "text/csv",
            "format": "csv"
        }
    ],
    "profile": "tabular-data-package"
}

But the escapeChar option should not be set by default according to the specification.

As an aside, is package.valid supposed to be throwing an error when both escapeChar and quoteChar are set? According to the specification, those two options should be mutually exclusive.