Describe your changes and why you are making these changes
WIP:
Handle outdated json serialized artifact. UI and SDK is good. Still need some work on server(Wednesday)
Test against S3 metadata storage and maybe compute engine + old table artifact+potential integration test(Thursday)
Benchmark parquet vs. json performance improvement(Friday)
Changes in
SDK: (1) change serialization method in serialization.py specifically read and write table. (2) SDK uses get_artifact_result_content handler to get artifact info from flow-run. We need to adapt to the new return value
Executor: Executor uses serialization from sdk/serialization.py so we don't have to change anything here.
Server: (1)We uses https://github.com/xitongsys/parquet-go to read Parquet bytes string as a parquet reader and output json data.(2) We also construct schema from the parquet file metadata for UI display.
UI: We receive from get_artifact_result_content handler of a schema and data. The data format is consistent with our current UI but we need some processing for schema.
Related issue number (if any)
Loom demo (if any)
Checklist before requesting a review
[ ] I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
[ ] I have performed a self-review of my code.
[ ] I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
[ ] If this is a new feature, I have added unit tests and integration tests.
[ ] I have run the integration tests locally and they are passing.
[ ] I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
[ ] All features on the UI continue to work correctly.
[ ] Added one of the following CI labels:
run_integration_test: Runs integration tests
skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)
Describe your changes and why you are making these changes
WIP:
Changes in
SDK: (1) change serialization method in
serialization.py
specifically read and write table. (2) SDK usesget_artifact_result_content
handler to get artifact info from flow-run. We need to adapt to the new return valueExecutor: Executor uses serialization from
sdk/serialization.py
so we don't have to change anything here.Server: (1)We uses https://github.com/xitongsys/parquet-go to read Parquet bytes string as a parquet reader and output json data.(2) We also construct schema from the parquet file metadata for UI display.
UI: We receive from
get_artifact_result_content
handler of a schema and data. The data format is consistent with our current UI but we need some processing for schema.Related issue number (if any)
Loom demo (if any)
Checklist before requesting a review
python3 scripts/run_linters.py -h
for usage).run_integration_test
: Runs integration testsskip_integration_test
: Skips integration tests (Should be used when changes are ONLY documentation/UI)