Open simonaubertbd opened 5 months ago
Thanks, definitely on my radar! I'm particularly looking at https://github.com/ibis-project/ibis to enable this.
@tgourdel From what I know there are two major philosophies: -distinguished tools between the "default" in-memory engine and the "in-database" engine (Alteryx Designer on Premise... but all in-in engine have the same tools, with a few exceptions to write code in pyspark, this kind of stuff) -same tools and a menu to choose the execution engine, either on each tool (Dataiku), either globally (Alteryx Cloud)
If the second one seems promising ("hey, it's transparent for user, he just have to switch the engine" while must of the functions just exist with the default engine), I prefer the first solution because it's more understandable for the user, easier for maintenance, allow mix of in-memory and in-database tools with "interface" between.
For reference, the dedicated in-database tools existing on Alteryx, the data stream in and out allows the bridge between the engines part.
Best regards,
Simon
@tgourdel Also : this tools are just here to build a query that is passed on the write-data in-db, the browse and the data stream out so it's not that hard to develop
E.g : this workflow
will create a table with this query
SELECT "Couleur", COUNT(*) AS "Count" FROM
(SELECT "Couleur", "Nom_Produit" FROM
(SELECT * FROM
(select "bd_market"."articles".* from "bd_market"."articles")
AS "a" WHERE "Classe" IS NOT NULL)
AS "a")
AS "a" GROUP BY "Couleur"
As you can see, it's nested for each tool, so pretty easy to do. It may not be optimized but as a first approach, it works well.
Hello,
A dreamt feature would be the ability to choose the execution engine. Some dataprep/etl tools have it (like Alteryx or Dataiku). That means two possibilities :
This is hard to develop of course but I think this has definitely this place on the roadmap.
Best regards,
Simon