DataThirstLtd / databricks.vsts.tools

VSTS Deployment Tasks for Databricks Objects
MIT License
18 stars 3 forks source link

Swapping line delimiters from UNIX to Windows (probably only when running on windows agent) #41

Open dbeavon opened 3 years ago

dbeavon commented 3 years ago

I'm using this in Azure devops.

https://marketplace.visualstudio.com/items?itemName=DataThirstLtd.databricksDeployScriptsTasks&ssr=false#qna

The task is: DataThirstLtd.databricksDeployScriptsTasks.databricksDeployDBFSFilesTask

When running on a windows agent (we have a large windows -based agent pool) it swaps line delimiters from UNIX to Windows style.

Is this the intention? It is breaking our shell scripts that are being deployed to Azure Databricks.

The workaround for now is to use Az.Databricks in PowerShell for deploying the bash shell scripts, and use the DataThirstLtd for everything else.

simondmorias commented 3 years ago

I took a look into this. The problem seems to stem from this function: https://github.com/DataThirstLtd/azure.databricks.cicd.tools/blob/master/Private/Get-Notebooks.ps1

It rejoins the file based on the OS line ending, so like you suggested this is a Windows only issue. When I run that function in a container it behaves. The problem is the AzDevOps tasks will only work on Windows PowerShell, I'm not sure why this is because our module is fully Core complaint.

For now the only real workaround I can suggest for you is to use the native PowerShell module and run it on Linux agent. See the Export/Import commands here: https://github.com/DataThirstLtd/azure.databricks.cicd.tools/wiki

dbeavon commented 3 years ago

Thanks for the quick reply.

Just to be clear, you are saying that the "native PowerShell module" can run your custom export/import commands? I wasn't aware that I had access to those outside of a datathirst task. I'm fairly new to devops (and to azure in general)

One challenge for me is that our agent pool is windows-only. It is a private agent pool using VM scalesets and is running within our VNET. I don't currently have access to a Linux agent and I think it would be frowned on if I used a normal devops hosted agent that runs outside of our VNET.

Thanks for clearing this up. I'm still confused why this is treated as a notebook, when I'm using a generic (DBFS) files operations. Perhaps it is too smart and is trying to distinguish all of our ASCII files from other types of files, and then treating the ASCII ones as notebooks...