RunLLM / aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.
https://aqueducthq.com
Apache License 2.0
521 stars 18 forks source link

Implement feature that read and write Table Artifact in parquet format #1305

Closed Fanjia-Yan closed 1 year ago

Fanjia-Yan commented 1 year ago

Describe your changes and why you are making these changes

WIP:

  1. Handle outdated json serialized artifact. UI and SDK is good. Still need some work on server(Wednesday)
  2. Test against S3 metadata storage and maybe compute engine + old table artifact+potential integration test(Thursday)
  3. Benchmark parquet vs. json performance improvement(Friday)

Changes in

  1. SDK: (1) change serialization method in serialization.py specifically read and write table. (2) SDK uses get_artifact_result_content handler to get artifact info from flow-run. We need to adapt to the new return value

  2. Executor: Executor uses serialization from sdk/serialization.py so we don't have to change anything here.

  3. Server: (1)We uses https://github.com/xitongsys/parquet-go to read Parquet bytes string as a parquet reader and output json data.(2) We also construct schema from the parquet file metadata for UI display.

  4. UI: We receive from get_artifact_result_content handler of a schema and data. The data format is consistent with our current UI but we need some processing for schema.

Related issue number (if any)

Loom demo (if any)

Checklist before requesting a review

saurav-c commented 1 year ago

Closing this PR for now until this work is ready to be merged per the new plan for this project.