DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.84k stars 123 forks source link

pandas gbq #375

Open skrawcz opened 1 year ago

skrawcz commented 1 year ago

Implement https://pandas.pydata.org/docs/reference/io.html#google-bigquery as a data loader & saver - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq.

  1. It should go into pandas_extensions.py, one class for the writer, on class for the reader, e.g. much like how the pandas pickle reader and writer are structured.
  2. There should be requisite tests to exercise the functionality. Though this will be need to be mocked, since we don't want to connect to GCP to run unit tests.
  3. We will want a separate example for this one. Since it requires having a GCP account.
  4. If there's an issue with type hints, let me know and we can chat through it.
bryangalindo commented 1 year ago

working on this

bryangalindo commented 1 year ago

@skrawcz for the third requirement, assuming we're talking about the materialization example, where do you want to store the example? (e.g., examples/pandas/materialization/gbq, examples/pandas/gbq/materialization, examples/pandas/materialization/notebook-gbq.ipynb)

skrawcz commented 1 year ago

Yeah let's just put it under examples/pandas/materialization/gbq and have similar script & notebook ? That way it can have its own README too.

bryangalindo commented 1 year ago

I am working on this 👍

Yeah let's just put it under examples/pandas/materialization/gbq and have similar script & notebook ? That way it can have its own README too.

I'm working on this 👍