Open CaptainDaVinci opened 3 years ago
Yes, this is a limitation. I dont remember the exact details because this was implemented a while ago, but it was an issue with how apache spark query plan support and handle subqueries. There were limitations in Apache Spark prevented us from implementing it efficiently. It could be that this has changed in later version of Spark, and this can be fixed.
If others reading this is also facing this issue, then please vote on this issue to raise its importance. And contributions are absolutely welcome.
@tdas how does it work on Databricks though? We're using a cluster with DBR 7.3 and the delete with subquery predicate works alright.
Edit: Alternatively, are there any docker images of databricks cluster that can be used instead? My basic requirement is to run integration tests in a pipeline which involve delta operations.
Facing the same issue. Working in Databricks but not locally.
Same issue.
Same issue locally.
Thanks for the comments/feedback everyone. For now, our H2 roadmap is quite full, so this is something that we can consider next year. Please keep the comments and feedback coming so we know how to prioritize items in our next roadmap!
Any plans to consider this year roadmap?. Looking forward to have this feature. Meanwhile any workaround for this feature?
+1 for fixing/providing it in open-source delta lake
I'm also interested in this functionality, any chance we could get this prioritized on the roadmap?
@dennyglee When will this feature be released to OSS?
+1, request to consider having this functionality in the year's roadmap.
+1 as well on this, Databricks runtime recently released the deletion vectors, to optimize the way that spark handles deletes (normally rewriting the data ex what you wanted deleted) by placing deletes in a separate file and excluding them from queries, later on doing the rewrite when it makes sense to do so. Perhaps considering this approach will overcome the original difficulties in implementing subquery deletion support as well, selfishly, it is for me to start moving processes over to Fabric, but I am sure that it is a feature that many people who come from a SQL world will love. Keep up the good work, it is greatly appreciated
+1, request to have this functionality. Because, delta lake doesn't generate DELETE FILES when we delete rows in a MERGE INTO query but only when explicitly deleted using DELETE FROM query. I am sure this will help a lot of people who wanted to create a MOR table in delta lake.
Raising this as a request for functionality.
Facing an error when using subqueries in where predicate while deleting. This code works fine on databricks but when running it on local machine it raises an error.
Sample code
Error log: