datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
195 stars 40 forks source link

0.0.51 breaks passing in named resources to Flow #85

Closed cschloer closed 5 years ago

cschloer commented 5 years ago

I need some time to figure out what's up here and to provide you better documention (it's possible this is on my end, if so I will close this issue) but there seems to be an issue with passing in a name for a resource to load since 0.0.51. Instead of only using the passed in name, it creates two resources:

  1. One using the passed in name, containing the correct headers, the correct types, and no rows.
  2. Another using the default empty name (aka the file name), containing the correct headers, the correct rows and no types.

I'll come back to this asap to give you more details, for now I'm using v0.0.50

akariv commented 5 years ago

Thanks for reporting.

Could you perhaps add a reproducing code snippet?

This what I tried:

>>> import dataflows
>>> dataflows.Flow(
  dataflows.load('data/beatles.csv', name='foo'), 
  dataflows.printer(),
  dataflows.dump_to_path('bar')
).process()
foo:
  #  name        instrument
     (string)    (string)
---  ----------  ------------
  1  john        guitar
  2  paul        bass
  3  george      guitar
  4  ringo       drums
(<datapackage.package.Package object at 0x106833fd0>, {'count_of_rows': 4, 'bytes': 941, 'hash': 'c9f12ff42d77930242a548c33a5e2e5e', 'dataset_name': None})

Then in bar/:

$ ls -la bar/
total 16
drwxr-xr-x   4 adam  staff   136B May  2 17:28 ./
drwxr-xr-x  41 adam  staff   1.4K May  2 17:28 ../
-rw-------   1 adam  staff    69B May  2 17:28 beatles.csv
-rw-------   1 adam  staff   872B May  2 17:28 datapackage.json
$ cat bar/datapackage.json 
{
  "bytes": 69,
  "count_of_rows": 4,
  "hash": "c9f12ff42d77930242a548c33a5e2e5e",
  "profile": "data-package",
  "resources": [
    {
      "dialect": {
        "caseSensitiveHeader": false,
        "delimiter": ",",
        "doubleQuote": true,
        "header": true,
        "lineTerminator": "\r\n",
        "quoteChar": "\"",
        "skipInitialSpace": false
      },
      "encoding": "utf-8",
      "format": "csv",
      "name": "foo",
      "path": "beatles.csv",
      "profile": "tabular-data-resource",
      "schema": {
        "fields": [
          {
            "format": "default",
            "name": "name",
            "type": "string"
          },
          {
            "format": "default",
            "name": "instrument",
            "type": "string"
          }
        ],
        "missingValues": [
          ""
        ]
      }
    }
  ]
}
cschloer commented 5 years ago

I'm out for the weekend unfortunately but I will get you a code snippet asap!

Adam Kariv notifications@github.com schrieb am Do., 2. Mai 2019, 16:31:

Thanks for reporting.

Could you perhaps add a reproducing code snippet?

This what I tried:

import dataflows>>> dataflows.Flow( dataflows.load('data/beatles.csv', name='foo'), dataflows.printer(), dataflows.dump_to_path('bar') ).process() foo:

name instrument

(string) (string)--- ---------- ------------ 1 john guitar 2 paul bass 3 george guitar 4 ringo drums (<datapackage.package.Package object at 0x106833fd0>, {'count_of_rows': 4, 'bytes': 941, 'hash': 'c9f12ff42d77930242a548c33a5e2e5e', 'dataset_name': None})

Then in bar/:

$ ls -la bar/ total 16 drwxr-xr-x 4 adam staff 136B May 2 17:28 ./ drwxr-xr-x 41 adam staff 1.4K May 2 17:28 ../ -rw------- 1 adam staff 69B May 2 17:28 beatles.csv -rw------- 1 adam staff 872B May 2 17:28 datapackage.json $ cat bar/datapackage.json { "bytes": 69, "count_of_rows": 4, "hash": "c9f12ff42d77930242a548c33a5e2e5e", "profile": "data-package", "resources": [ { "dialect": { "caseSensitiveHeader": false, "delimiter": ",", "doubleQuote": true, "header": true, "lineTerminator": "\r\n", "quoteChar": "\"", "skipInitialSpace": false }, "encoding": "utf-8", "format": "csv", "name": "foo", "path": "beatles.csv", "profile": "tabular-data-resource", "schema": { "fields": [ { "format": "default", "name": "name", "type": "string" }, { "format": "default", "name": "instrument", "type": "string" } ], "missingValues": [ "" ] } } ] }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/85#issuecomment-488696757, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWNXP3IIQZGAIJPS5YMFSDPTL3NXANCNFSM4HJ6HJUA .

akariv commented 5 years ago

Thanks!

On Thu, May 2, 2019 at 5:58 PM Conrad Schloer notifications@github.com wrote:

I'm out for the weekend unfortunately but I will get you a code snippet asap!

Adam Kariv notifications@github.com schrieb am Do., 2. Mai 2019, 16:31:

Thanks for reporting.

Could you perhaps add a reproducing code snippet?

This what I tried:

import dataflows>>> dataflows.Flow( dataflows.load('data/beatles.csv', name='foo'), dataflows.printer(), dataflows.dump_to_path('bar') ).process() foo:

name instrument

(string) (string)--- ---------- ------------ 1 john guitar 2 paul bass 3 george guitar 4 ringo drums (<datapackage.package.Package object at 0x106833fd0>, {'count_of_rows': 4, 'bytes': 941, 'hash': 'c9f12ff42d77930242a548c33a5e2e5e', 'dataset_name': None})

Then in bar/:

$ ls -la bar/ total 16 drwxr-xr-x 4 adam staff 136B May 2 17:28 ./ drwxr-xr-x 41 adam staff 1.4K May 2 17:28 ../ -rw------- 1 adam staff 69B May 2 17:28 beatles.csv -rw------- 1 adam staff 872B May 2 17:28 datapackage.json $ cat bar/datapackage.json { "bytes": 69, "count_of_rows": 4, "hash": "c9f12ff42d77930242a548c33a5e2e5e", "profile": "data-package", "resources": [ { "dialect": { "caseSensitiveHeader": false, "delimiter": ",", "doubleQuote": true, "header": true, "lineTerminator": "\r\n", "quoteChar": "\"", "skipInitialSpace": false }, "encoding": "utf-8", "format": "csv", "name": "foo", "path": "beatles.csv", "profile": "tabular-data-resource", "schema": { "fields": [ { "format": "default", "name": "name", "type": "string" }, { "format": "default", "name": "instrument", "type": "string" } ], "missingValues": [ "" ] } } ] }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/85#issuecomment-488696757, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABWNXP3IIQZGAIJPS5YMFSDPTL3NXANCNFSM4HJ6HJUA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/85#issuecomment-488707173, or mute the thread https://github.com/notifications/unsubscribe-auth/AACAY5NIYMLNF32Y252MITDPTL6R5ANCNFSM4HJ6HJUA .

cschloer commented 5 years ago

Thanks for the fix! Sorry I never got back around to reproducing

akariv commented 5 years ago

I hope this actually does the trick - let me know if it doesn't work :)

On Mon, May 27, 2019 at 10:59 AM Conrad Schloer notifications@github.com wrote:

Thanks for the fix! Sorry I never got back around to reproducing

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/85?email_source=notifications&email_token=AACAY5MDI2PK2EQUJMLBBRLPXOIHNA5CNFSM4HJ6HJUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWJCVVQ#issuecomment-496118486, or mute the thread https://github.com/notifications/unsubscribe-auth/AACAY5L7VJULVXNVHOQOW3DPXOIHNANCNFSM4HJ6HJUA .

cschloer commented 5 years ago

Seems to work on my end!