ceumicrodata / mETL

mito ETL tool
161 stars 41 forks source link

Boolean types from .csv files #15

Closed e3krisztian closed 11 years ago

e3krisztian commented 11 years ago

There is a postgres export in .csv. Postgres exports boolean values as t and f for true and false respectively.

This file is to be reimported into postgres, with the fields defined as boolean in the target table.

How can I define a field that converts the string values f and t in source to Boolean values true and false in target?

I have tried these variants:

-
    name: sent_for_geocoding
    # finalType: Boolean
    transforms:
        - transform: Map
          values:
                t: true
                f: false

It actually works, however this is not clean: the field type is defined as String, the values are python booleans (True and False), so there is a type mismatch, it works by chance: all parties are dynamic enough to get over the type mismatch. Note, that finalType is commented out - the python bools are still erroneously stored in a String field internally!

So let's try to do it properly - it turns out that we can not:

-
    name: boolean_variable
    finalType: Boolean
    transforms:
        - transform: Map
          values:
                t: 'true'
                f: 'false'
        - transform: ConvertType
          fieldType: Boolean

It does not work, ConvertType converts 'false' to True as well - in fact it I do not know if is possible at all to get a False out - the bool value of every non-empty string is True, the empty string however gets converted to NULL!

Finally the simplest:

- name: boolean_variable

Works! Types are consistent within mETL config (String), but it is the database that does the work: the postgres database implicitly converts t and f strings to booleans - we are dependending on postgresql, probably fails with any other db!

bfaludi commented 11 years ago

You have right I fixed this. The boolean conversion is worked for Integers. So if you want to work the correct solution was the following:

name: boolean_variable
finalType: Boolean
transforms:
    - transform: Map
      values:
            t: 1
            f: 0
    - transform: ConvertType
      fieldType: Integer
    - transform: ConvertType
      fieldType: Boolean

It was not good enough so added more rule for strings at version 0.1.6.12:

e3krisztian commented 11 years ago

:+1: It was really fast!

I confirm the following is working now:

    name: boolean_variable
    type: Boolean