frictionlessdata / datapackage-pipelines

Framework for processing data packages in pipelines of modular components.
https://frictionlessdata.io/
MIT License
119 stars 32 forks source link

AssertionError when not all resources included in concatenate process #156

Open jbothma opened 5 years ago

jbothma commented 5 years ago

datapackage-pipelines==2.0.0

I have a pipeline loading multiple CSVs using load. When one of the resources that are loaded is not listed under concatenate we get the following error

|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/datapackage_pipelines/specs/../lib/concatenate.py", line 25, in <module>
|     spew_flow(flow(ctx.parameters), ctx)
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/datapackage_pipelines/utilities/flow_utils.py", line 46, in spew_flow
|     datastream = flow.datastream()
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/dataflows/base/flow.py", line 18, in datastream
|     return self._chain(ds)._process()
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 42, in _process
|     datastream = self.source._process()
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 46, in _process
|     self.datapackage = self.process_datapackage(self.datapackage)
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/dataflows/helpers/datapackage_processor.py", line 15, in process_datapackage
|     ret = next(self.dp_processor)
|   File "/home/jdb/proj/code4sa/treasury-portal/treasury-pipelines/env/lib/python3.7/site-packages/dataflows/processors/concatenate.py", line 89, in func
|     assert not match
| AssertionError

The intent is that all resources should be normalised and concatenated. It wasn't clear at all why this was failing, if having all resources in concatenated is even a requirement. A more meaningful error would be really helpful if this is a requirement.

jbothma commented 5 years ago

to reproduce:

pipeline-spec.yaml

expenditure-time-series:
  pipeline:

    - run: load
      parameters:
        from: 'file1.csv'
        name: 'ene-2018-19'
        format: 'csv'

    - run: load
      parameters:
        from: 'file2.csv'
        name: 'ene-2017-18'
        format: 'csv'

    - run: load
      parameters:
        from: 'file3.csv'
        name: 'ene-2016-17'
        format: 'csv'

    - run: concatenate
      parameters:
        sources:
          - ene-2018-19
          - ene-2016-17
        target:
          name: expenditure-time-series
        fields:
          bob: []
          dave: []
      cache: false

file1.csv

bob,dave
123,456

file2.csv

bob,dave
1234,456

file3.csv

bob,dave
12345,456