GoogleCloudPlatform / market-data-transcoder

ffmpeg for market data
Apache License 2.0
35 stars 11 forks source link

Refrain from transcoding SBE field names in snake_case #79

Open salsferrazza opened 1 year ago

salsferrazza commented 1 year ago

I believe there is an option in the SBE decoding library that snake cases all of the decoded field names, this should be suppressed and default to verbatim transcoding of the field name as specified in the schema.

mservidio commented 1 year ago

@salsferrazza Yes, field names are converted using this:

def convert_to_underscore(name):
    name = name.strip('@').strip('#')
    sub_str = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', sub_str).lower()

However, naming requirements differ per output type. IE: BigQuery won't accept a dash '-' in a column name. So if we considered doing something like this we still need some way to sanitize field names based on the output type requirements.

See: https://cloud.google.com/bigquery/docs/schemas