gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Add PostgreSQL as a backend target (MVP) #137

Open nj1973 opened 3 months ago

nj1973 commented 3 months ago

This issue is to implement PostgreSQL as a backend target but not spend too much effort tuning the PostgreSQL load from GCS, this is because we suspect we need a GCS to PostgreSQL FDW extension, hence the MVP in the title.

For MVP transport my initial thought was to use a single COPY FROM STDIN command but COPY does not accept Avro or Parquet. Is there an alternative or do we need to stage to CSV? Would CSV limit supported data types?

Perhaps instead of MVP this should just be "phase 1" which is to implement everything except data copy?

Tasks:

I've probably missed some tasks so don't rely solely on the list above.

nj1973 commented 3 months ago

Possible alternate flow is for Spark to stage to an unlogged table in PostgreSQL instead of Cloud Storage and then INSERT/SELECT from there.