TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

I1497 csv io #1508

Closed ninjapapa closed 5 years ago

ninjapapa commented 5 years ago

Fixed #1497

Using the new iomod package to implement csv input and output. Some code was copied from smvinput.py. The original version will be removed in issue #1503.

laneb commented 5 years ago

Do we need a separate connection configuration for the schema? In practice won't the details always be the same as for the CSV file itself?

ninjapapa commented 5 years ago

Schema file will be potential in a place different than csv files. Please see the docstr of _get_schema_connection method. Basically we will allow 4 different ways for user to specify schema of a CSV file:

This flexibility is needed since some user may want schema to be closer to data, some others may not or may not be allowed.

laneb commented 5 years ago

It's clear from the code and docstring that we're trying to support having the schema file located in a different place than CSV, just not why. That second bullet point is what I was missing. We should add it to the docstring.