Is your feature request related to a problem? Please describe.
Following on from issue #57 we need to also edit the existing guidance page about personal clusters that currently lives here: https://dfe-analytical-services.github.io/analysts-guide/ADA/databricks_rstudio_personal_cluster.html
The initial part of issue 57 will be resolved by PR #78 which deals with SQL warehouses which will probably be used by most analysts with existing pipelines because they can do an almost like-for-like replacement of the code they currently use to connect to tables in SQL Server.
Describe the solution you'd like
The content of the page needs to be reframed so it's not just about setup, it's also about
what personal clusters are
how you'd set one up
how you'd know when to use one in conjunction with RStudio instead of using a SQL warehouse (basically if you need access to Volumes in Databricks rather than tables)
the risks (perhaps overly harsh word) of using personal clusters, including the fact that they time out after 60 mins of inactivity and then all your code has to be re-run to get everything back into your environment (basically the same as a "Restart R" in RStudio)
We also need to consider whether we want to discuss the use of sparklyr with a personal cluster. It's not required and means that thr user needs to have Python installed locally to get it working. In the spirit of being the Stats Development Team then I think this is a thing that we could/should cover although with caveats where required. It's not clear to me currently how many people would end up using this method regularly.
Additional context
Worth reading through PR #78 to see the extensive discussion in the comments!
Is your feature request related to a problem? Please describe. Following on from issue #57 we need to also edit the existing guidance page about personal clusters that currently lives here: https://dfe-analytical-services.github.io/analysts-guide/ADA/databricks_rstudio_personal_cluster.html The initial part of issue 57 will be resolved by PR #78 which deals with SQL warehouses which will probably be used by most analysts with existing pipelines because they can do an almost like-for-like replacement of the code they currently use to connect to tables in SQL Server.
Describe the solution you'd like The content of the page needs to be reframed so it's not just about setup, it's also about
We also need to consider whether we want to discuss the use of sparklyr with a personal cluster. It's not required and means that thr user needs to have Python installed locally to get it working. In the spirit of being the Stats Development Team then I think this is a thing that we could/should cover although with caveats where required. It's not clear to me currently how many people would end up using this method regularly.
Additional context Worth reading through PR #78 to see the extensive discussion in the comments!