add split processor to standard library

frictionlessdata / datapackage-pipelines

Framework for processing data packages in pipelines of modular components.

MIT License

119 stars 32 forks source link

At least one use case for this could be accomplished by adding a parameter to the filter processor in which it should create a new resource for the filtered rows instead of working on the source resource. A second one could be accomplished by using a 'group-by' sort of processor, which takes a sorted stream and splits it to multiple streams based on a "key" (composed out of values in specific columns). The main problem with the latter is that you need to know in advance the list of distinct values in the data (so that you can modify the resource list in the datapackage), which complicates significantly the implementation.

frictionlessdata / datapackage-pipelines

add split processor to standard library #109