dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.27k stars 1.42k forks source link

R as an available Execution Environment #2624

Open JohnMav opened 4 years ago

JohnMav commented 4 years ago

Since 0.8.0 the separation of Dagster Host and User process has allowed creating workspace environments with different versions of Python. For teams that have users that work primarily in R driven environments it would be amazing to be able to designate a workspace environment as a specific R version and be able to run dagster pipelines within that environment.

issue #1585 raises the notion of supporting polyglot notebooks but would love for this to go a step further and support tools like RMarkdown/Shiny apps in a similar fashion.

nlarusstone commented 2 years ago

I want to add a big +1 here and add a bit more context about why this is something that would be extremely useful. Many organizations (mine included) have a combination of R and Python scripts used in their pipelines. In particular, R is really powerful for statistical modeling and has a huge number of packages that don't exist in Python. It's not a great language for writing production code, but it's hard to move entirely off of it into Python.

I love a lot of the motivations behind Dagster (typing, testing, etc.) and would like to bring that to as much of our pipelines as possible. I understand it's probably hard to do that, so even just having a simple R operator that passes dataframes between R and Python would be incredibly powerful. It seems like the current workaround involves running a shell script or Docker container to pass data via files -- which works, but reduces a lot of the power of Dagster.

I've only just started using Dagster, so I'm not familiar enough to know how easy/hard this is or where I would even begin to contribute a feature like this, but wanted to make sure this issue doesn't die!

I've collected a few mentions from Slack where other people have mentioned this as something they would like to do: (Original post that spawned this issue): https://dagster.slack.com/archives/CCCR6P2UR/p1592431695124700 https://dagster.slack.com/archives/CCCR6P2UR/p1614100503079800 https://dagster.slack.com/archives/C01U954MEER/p1638575143351700?thread_ts=1638550558.333200&cid=C01U954MEER https://dagster.slack.com/archives/C014N0PK37E/p1641375596083800