frictionlessdata / datapackage-pipelines

Framework for processing data packages in pipelines of modular components.
https://frictionlessdata.io/
MIT License
118 stars 32 forks source link

Allow plugins to provide custom spec parsers #38

Open brew opened 7 years ago

brew commented 7 years ago

Currently, plugins can optionally provide one Generator class. And one my_plugin.source-spec.yaml filetype per generator. This means each *.source-spec.yaml filetype requires its own plugin, and generators in separate plugins can't share common processors.

I propose letting plugins provide their own custom spec parsers that extend parsers.base_parser.BaseParser. This would allow plugins to resolve source-specs and generators in their own way, potentially allowing plugins to provide more than one generator type, subsequently allowing more than one *.source-spec.yaml filetype per plugin.

For example, the datapackage_pipelines_measure plugin could have a social-media generator, a website-analytics generator, a code-packaging generator, etc. And each project directory could contain the corresponding social-media.measure.source-spec.yaml, website-analytics.measure.source-spec.yaml, and code-packaging.measure.source-spec.yaml files.

A proposed parser discovery solution:

    • [ ] specs.find_specs() looks for more parsers (subclasses of BaseParser) in the parsers directory of the plugin
    • [ ] instances of discovered plugin-supplied parsers are prepended to specs.SPEC_PARSERS (so they take precedence over native parsers).
    • [ ] specs.find_specs() carries on as normal

What do you think, @akariv?

akariv commented 7 years ago

Although I'm not opposed to this idea at all, I think that for this use case it might not be needed.

I'm thinking of a single measure.source-spec.yaml file, with sections, such as:

project-name: my project
configuration:
  social-media:
    facebook: <token>
    twitter: <token>
  analytics:
    ga: <token>
  code-packaging:
    ...

Then, the generator would generate a few pipelines, named:

<project-name>-social-media
<project-name>-web-analytics
<project-name>-code-packaging
...

Each one with the correct processors and based on the provided configuration. I think this is a little better as this way, one file holds all settings for a single project.

wdyt?

brew commented 7 years ago

Ah, okay. So I yield more than one pipeline spec from the generator? That makes sense. One downside is the Generator class might become a bit monolithic.

akariv commented 7 years ago

Yes - but we can try and make it modular in the plugin itself (e.g. use other classes to do the actual work and leave the generator as a wrapper)