MeltanoLabs / target-athena

Singer.io Target for AWS Athena.
Other
5 stars 16 forks source link

camelCase Records are null #29

Closed pnadolny13 closed 2 years ago

pnadolny13 commented 2 years ago

I'm using tap-google-analytics which returns record keys using camelCase. During table creation the fields are automatically converted to lowercase by Athena so then when I run a query its looking for keys using the lower case variant and ultimately returns null for the entire column.

I was able to get it to work by rerunning the create external table statement with case.insensitive=true in the DDL I pulled from the logs which included:

WITH SERDEPROPERTIES (
  'ignore.malformed.json'='true',
  'case.insensitive'='false'
  ) 

We set case.insensitive=false in the target when using jsonl records.

Proposed Solution:

  1. set case.insensitive=true by default for jsonl. I don't see any harm in doing this but there might be an edge case I'm missing. If there are records with the same name that are distinct by casing theyll get clobbered either during table creation or on read but either way there will be a problem.
  2. As part of writing to jsonl files we could lowercase all field names in the record and potentially do any additional transformations needed for Athena to accept it. Other transformations could include replacing special characters or aliasing reserved column names.

I think option 1 is best to solve the short term problem and although I'm sensitive to transforming records in the target I think option 2 might be the long term solve in addition to capture other edge cases or at least send them through a filter and send a warning log or something.