Clever / optimus

Concurrently extract, transform, and load tables of data in Go
Apache License 2.0
34 stars 7 forks source link

combiner: combine two tables via join or extend #1

Closed nathanleiby closed 10 years ago

nathanleiby commented 10 years ago

This implements

Would love your feedback on

  1. If join / extend are in scope for optimus
  2. What interfaces are most appropriate for these methods
  3. Suggested optimizations. Currently, this is a naive implementation, so I would like to optimize it substantially and better utilize golang

Thanks!

image

azylman commented 10 years ago

Generally, I think it'd be best if Join and Extend were functions that returned TransformFuncs, just like everything else. Then they're more general and you could do something like optimus.Transform(table1, transforms.Join(table2, leftHeader, rightHeader)), as well as using them in a Transformer.

I haven't started looking at the code yet, but I think it's very reasonable for them to be "in scope" for optimus.

azylman commented 10 years ago

Also, what do you think about calling Extend Concat or something instead? Might be my own biases, but I heard Extend and thought it was operating on rows (a la underscore.extend), not combining two tables, until I read the docs.

azylman commented 10 years ago

I don't think the performance is probably worth talking about until these are TransformFuncs, because it will change drastically after that (some concurrency built in, as well as streaming Rows).

nathanleiby commented 10 years ago

:+1: Thanks for the suggestions! I'll work on implementing these updates first.

nathanleiby commented 10 years ago

@azylman Join has been refactored to a TransformFunc. There is one TODO in the code for the failing error handling test and another TODO for possibly using a go routine. Also, did you have a suggestion for a better way to do the JoinType enum?

nathanleiby commented 10 years ago

Thanks for the in-person feedback. I've fixed the broken test and added the go routine.

nathanleiby commented 10 years ago

@azylman updated to use struct for join type.