-
As a library for production usage, it would be useful to add logging, metrics, and tracing support. We need to do some investigation about to fit into rust ecosystem as much as possible.
-
As of July 2023, we have paused active development on TorchData and have paused new releases. We have learnt a lot from building it and hearing from users, but also believe we need to re-evaluate the …
-
There are use cases where a pipeline may be dependent on another pipeline within the same stage. For such use cases it would be easier to set a dependent pipeline within the same stage. Another use c…
-
**Describe the problem you faced**
For context we have tables that are snapshotted daily/weekly (eg. RDS export) that we then have Spark jobs convert into Hudi tables (ie. we overwrite the full table…
-
Subscribe to this issue and stay notified about new [daily trending repos in Java](https://github.com/trending/java?since=daily)!
-
Presto coordinator UI is a great tool to debug presto cluster's health, running queries, worker status and so on. Presto now supports native worker, which acts as a drop on replacement of the java wor…
-
**What**
` EXPLAIN` queries over `pg_lakehouse` foreign tables are not working correctly for a few reasons:
1. Running `EXPLAIN` on a query that's fully pushed down to DuckDB returns the query p…
-
As a user
I want to be able to uninstall a stack or demo by itself, after I'm done looking at it
to have a clean cluster again and not have to remove things manually, which is error prone.
- Wha…
-
**Describe the problem you faced**
**Long time time executing Upserts in HUDI. it takes 4 or 5 times longer doing Updates than Inserts. 90% data needs to be updated**
Code below takes aroun…
-
I'm in the system design phase of a new project and due to the scale of our data we plan to run multi node + multi gpu training with Pytorch on Databricks. I'm particularly interested in using this li…