datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
MIT License
193 stars 39 forks source link

How does concatenate work? #161

Closed ColinMaudry closed 3 years ago

ColinMaudry commented 3 years ago


I'm trying to use concatenate, using the documentation and the tutorial (the example is a bit cryptic :)), but I fail to make it work.

Here is the script:

The resources decp and previous-decp have the same columns.

The command

python3 scripts/

yields the following error:

Téléchargement des données tabulaires précédentes...
Traceback (most recent call last):
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 79, in _process
    self.datapackage = self.process_datapackage(self.datapackage)
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/helpers/", line 15, in process_datapackage
    ret = next(self.dp_processor)
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/processors/", line 93, in func
    assert not match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts/", line 85, in <module>
  File "scripts/", line 44, in decp_processing
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 15, in process
    return self._chain().process()
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 118, in process
    ds, _ = self.safe_process()
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 114, in safe_process
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 97, in raise_exception
    raise cause
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 102, in safe_process
    ds = self._process()
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 75, in _process
    datastream = self.source._process()
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 86, in _process
  File "/home/colin/.local/lib/python3.8/site-packages/dataflows/base/", line 96, in raise_exception
    raise error from cause
dataflows.base.exceptions.ProcessorError: Errored in processor datapackage_processor in position #21: 

I have looked at the source code, but couldn't figure it out, especially the role of the suffix and prefix variables (I'm a beginner in Python).

akariv commented 3 years ago

Hey there @ColinMaudry

With concatenate it's important that all source resources are consecutive in the datapackage. From reviewing your script, it seems that between decp and prev-decp comes sans-titulaires.

The error message is obviously very cryptic which I'll fix in the next version.

ColinMaudry commented 3 years ago

Thanks @akariv , I have worked around it, but I take note for the next time!