frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
189 stars 44 forks source link

Foreign keys not dereferenced when reading packages produced by DataFlows #275

Closed as2875 closed 3 years ago

as2875 commented 3 years ago

To reproduce, run the following code:

import dataflows
import datapackage

def add_foreign_keys(package):
    package.pkg.descriptor["resources"][0]["schema"]["foreignKeys"] = [
        {
            "fields": "index",
            "reference": {
                "resource": "continents",
                "fields": "index"
            }
        }
    ]

    yield package.pkg
    yield from package

cities = [{"city": "Luanda", "index": 0},
          {"city": "Brazzaville", "index": 0},
          {"city": "London", "index": 1},
          {"city": "Paris", "index": 1}]
continents = [{"index": 0, "continent": "Africa"},
              {"index": 1, "continent": "Europe"}]

f = dataflows.Flow(cities,
                   continents,
                   dataflows.update_resource("res_1", name="cities"),
                   dataflows.update_resource("res_2", name="continents"),
                   add_foreign_keys,
                   dataflows.dump_to_zip("package.zip"))
f.process()

package = datapackage.Package("package.zip")
for row in package.get_resource("cities").read(keyed=True, relations=True):
    print(row)

The output is

{'city': 'Luanda', 'index': 0}
{'city': 'Brazzaville', 'index': 0}
{'city': 'London', 'index': {'index': 1, 'continent': 'Europe'}}
{'city': 'Paris', 'index': {'index': 1, 'continent': 'Europe'}}

whereas I would expect

{'city': 'Luanda', 'index': {'index': 0, 'continent': 'Africa'}}
{'city': 'Brazzaville', 'index': {'index': 0, 'continent': 'Africa'}}
{'city': 'London', 'index': {'index': 1, 'continent': 'Europe'}}
{'city': 'Paris', 'index': {'index': 1, 'continent': 'Europe'}}

(@lwinfree, @sje30)


Please preserve this line to notify @roll (lead of this repository)

lwinfree commented 3 years ago

hi @roll do you have any ideas?

roll commented 3 years ago

Hi @as2875,

So it dereferenced for index==1 but not for index==0?

as2875 commented 3 years ago

Thanks @roll. Yes, index==0 is not dereferenced but all following foreign keys are.

roll commented 3 years ago

Ok, thanks. Could you please past the output package I will test it

as2875 commented 3 years ago

Yes, please see attached.

package.zip

roll commented 3 years ago

FIXED in https://github.com/frictionlessdata/tableschema-py/commit/3f46c53d816abb9ab3a6ec2188e79471644a25c3

roll commented 3 years ago

Hi @as2875,

Please try updating to $ pip install --upgrade tableschema==1.19.3

as2875 commented 3 years ago

Thanks @roll. I think it works now.