Clever / optimus

Concurrently extract, transform, and load tables of data in Go
Apache License 2.0
34 stars 7 forks source link

transforms: fieldmap shouldnt map nil values when passed nonexisting headers #6

Closed nathanleiby closed 10 years ago

nathanleiby commented 10 years ago

see transforms_test Fieldmap-IgnoreInvalidMap for an example of the behavior after adding the modification

@azylman your thoughts on this change? (and additional tests to clarify the behaviors of fieldmap)

azylman commented 10 years ago

I'm not sure what expected behavior would be here - personally, in my code, I would try to avoid any situation where I was trying to field map a field that didn't exist, because I think no matter what optimus does it's going to be confusing for consumers.

I think reasonable behavior in this case would be to either 1) do what you did here, where the field doesn't exist in the output document or 2) to set it to nil in the output document.

The first could be confusing to consumers if they expect this field to exist, the second could be confusing to consumers if they expect it to be a type that isn't nullable (e.g. string, number, struct).

nathanleiby commented 10 years ago

@azylman agreed - this is a little bit "magic" with the current change, to ignore an invalid mapping.

Can you think of of a scenario where (2) is useful?

(1) is definitely useful to me ... I can provide many mappings and those which are valid get applied. Otherwise I need to:

  1. inspect first row's headers
  2. do an intersection of those with the field mappings I want to apply
  3. apply the reduced list of field mappings
nathanleiby commented 10 years ago

Basically what (2) is doing is adding a now "column" (a new "header" : nil within all rows). Might that be a different method altogether from fieldmap?

azylman commented 10 years ago

I think of field mapping as saying "make a document with all of these headers, using the values from an input row". In that sense, it would be very confusing for a document to not have those headers.

But I guess this is close to the logic that we're used to from other libraries.