datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

Using inline data as source raises AttributeError #82

Closed anuveyatsu closed 5 years ago

anuveyatsu commented 5 years ago

Trying to load inline data as described in the tabulator docs - https://github.com/frictionlessdata/tabulator-py#inline-read-only

But it raises the following error. Looks like it's expecting string not list object as a source. I thought there is a way to tell that this source is inlined data and tried to specify format='inline' but it didn't help:

Traceback (most recent call last):
  File "date.py", line 32, in <module>
    Calendar_Date_Dimension()
  File "date.py", line 28, in Calendar_Date_Dimension
    flow.process()
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/base/flow.py", line 15, in process
    return self._chain().process()
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/base/datastream_processor.py", line 83, in process
    ds = self._process()
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/base/datastream_processor.py", line 72, in _process
    datastream = self.source._process()
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/base/datastream_processor.py", line 72, in _process
    datastream = self.source._process()
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/base/datastream_processor.py", line 75, in _process
    self.datapackage = self.process_datapackage(self.datapackage)
  File "/Users/anuarustayev/Desktop/repos/sandbox-cubes/cubes/lib/python3.6/site-packages/dataflows/processors/load.py", line 88, in process_datapackage
    if self.load_source.startswith('env://'):
AttributeError: 'list' object has no attribute 'startswith'
anuveyatsu commented 5 years ago

Looking at code here https://github.com/datahq/dataflows/blob/master/dataflows/processors/load.py#L80-L87

It expects the source object to be either tuple or string so may be I need to convert my list into a tuple?

akariv commented 5 years ago

To load data from a list of dictionaries (or any iterator, really), just add it to the flow - no need to use load:

e.g.

data = [
  dict(a=1, b=2),
  dict(a=2, b=3),
]

Flow(
   data, printer()
).process()
anuveyatsu commented 5 years ago

Thanks @akariv closing this issue

rufuspollock commented 5 years ago

@akariv should that example be in docs?