RickXMoore commented 4 years ago

Let me start off by saying that I was asked by my Microsoft rep to write this up and post it here in the hope that it might benefit others using ADF. My biggest complaint with this is Microsoft’s stance on how deployments are handled. There are a lot of organizations including mine that are in no position to have every object migrated to each data factory for a small number of changes. I have a DevOPs repository attached to development and used the publish process to migrate from GIT to DF and then used the process of exporting and importing the ARM template to migrate to our newly configured test environment. The process ignored 10-15 objects out of approximately 300 without any indication of errors. If it wasn’t for the fact that I have everything broken down into specific folders, I would have had a difficult time determining which objects failed to migrate. I located the missing objects and deployed them via PowerShell. The statement below under unsupported features leads you to believe that cherry picking is a difficult process that can’t be managed, which is entirely false. • “By design, ADF does not allow cherry-picking commits or selective publishing of resources. Publishes will include all changes made in the data factory o Data factory entities depend on each other, for instance, triggers depend on pipelines, pipelines depend on datasets and other pipelines, etc. Selective publishing of a subset of resources may lead to unexpected behaviors and errors I manage every deployment I do from test to stage to production using PowerShell scripts to create data sets, pipelines and triggers, all associated together. This process works great except that the PowerShell commands and JSON files generated by GIT aren’t in some cases what ADF is expecting to be passed. For example, there are cases when the JSON file through GIT lists the property under AzureSqlTable as schema instead of structure. ADF imports via PowerShell fail due to the schema and table being separated into different fields, or when using queries when generating the source data and it fails due to the absence of the tableName property. Once you know how to get around these and other issues, it’s quite simple to manage deploying directly from PowerShell. Even their own automated CI/CD process has manual steps that must be executed via PowerShell which shows the process is incomplete and inadequate. I’ll leave it with the following issues that I believe must be corrected / enhanced for Azure Data Factory to rise to the level expected by their customers.

The stance on deployment requirements needs to change, either by modifying the existing CI/CD process to be more functional or by providing another avenue for deploying objects.
The PowerShell and Data Factory product teams need to work more closely together to ensure 100% compatibility.
Simplify the use or requirements around ARM templates.
Better communication around updates to Data Factory. The release of Data Flow in mid-June broke several PowerShell cmdlets. I hope this helps some out there using the product now or looking at the product for the future. My hope is that together with my rep we can help MS understand how the product is being used and what needs to be corrected to make Data Factory an outstanding Azure feature.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 1e22e4d6-366f-fbf8-e141-8905ca7ab7ae
Version Independent ID: 5be2acf2-66e7-cadf-6666-5a85a65b9771
Content: Continuous integration and delivery in Azure Data Factory - Azure Data Factory
Content Source: articles/data-factory/continuous-integration-deployment.md
Service: data-factory
GitHub Login: @djpmsft
Microsoft Alias: daperlov

himanshusinha-msft commented 4 years ago

Thanks for the feedback and bringing this to our notice . At this time we are reviewing the feedback and will update the document as appropriate .

himanshusinha-msft commented 4 years ago

We have assigned the issue to the content author . They will evaluate & update the document as appropriate

Thomas-Bailey commented 4 years ago

Going to second this. The current promotion using ARM templates from dev to test to live is painful, especially when integrated with Git and deploying through DevOps

PowerShell documentation out of date for v4 when deploying using DevOps, including the tidy script provided in the documentation. Causes errors if run step by step.
Requirement for Powershell to carry the deployment over the line due to lack cross functionality between ADF and DevOps is just painful:

Requirement to deactivate and reactivate triggers. Requirement to tidy up removed linked services yourself - why can't deployments replicate the state of Git and allow for DropIfNotInSource? Why do I need PS to tell my triggers to use test parameters instead of dev? Why can't I just deploy from branches like I can using DACPACs to a SQL database? Why must my master branch not be a reflection of my true 'master' that's in Live and be ahead of it? Can't I just deploy from test and dev integration branches like I can to a database? Where is the DACPAC-style task for deploying to ADF anyway? Automate the generation of ARM templates from a dev branch and allow it to deploy to dev.
Allow the option to ignore certain types of parameters in linked services and triggers during deployment and allow those parameters to be changed outside of Git-Mode if you can't simplify the move from dev to test in ARM template deployments.

Thomas-Bailey commented 4 years ago

Long and short of this is I'm removing Git integration from my Test and Live factories until you can figure this out properly.

NowinskiK commented 4 years ago

Hi Thomas. Did I understand you well? Do you have (had) Git integration enabled for other (than dev) environments, like Test and Live? If so, that's not the way you should use git with ADF. All the other things about an analogy to DACPAC and how that works - I agree and I'm all with you. That's why I started working on open-source code, PowerShell module, to do these things. Take a look: https://github.com/SQLPlayer/azure.datafactory.tools

mtvessel commented 4 years ago

I couldn't agree more. We have a team of developers all working on multiple feature branches in parallel. We need to be able to publish any or all of those features to our collaboration branch, but selectively move features to higher environments as they are tested/approved. Right now our only option is to work solely in feature branches, and test using debug triggers. Debug triggers are highly unstable, buggy, and monitor logs don't get saved. Woe unto me if I'm using a debug trigger to test a 10 hour procedure and my network drops during that time. I will never be able to restore the monitor output for that run. It's just gone, as is my time and testing results.

I don't understand why I can't discriminate between what gets published to the live factory, and what I can stage for release to higher environments.

RickXMoore commented 4 years ago

I currently manage all of this via PowerShell 7, but my problem is that it’s still a manual process. I’m a SQL DBA with 30+ yrs, but I’m not a developer, so my coding options are rather limited. I can publish a single object or an entire feature thru each environment and never have to deploy the entire datafactory each time. Even with the link in the initial reply, it still deploys the entire datafactory.

We need a CI / CD process that can manage individual objects or sets of objects that can be quickly deployed to any number of environments after those changes have been successfully published to the initial ADF environment whatever that might be in your company.

From: mtvessel notifications@github.com Sent: Thursday, April 23, 2020 1:23 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: Rick Moore Rick.Moore@cfainstitute.org; Author author@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] ADF CI/CD and PowerShell (#43390)

[External Sender]

I couldn't agree more. We have a team of developers all working on multiple feature branches in parallel. We need to be able to publish any or all of those features to our collaboration branch, but selectively move features to higher environments as they are tested/approved. Right now our only option is to work solely in feature branches, and test using debug triggers. Debug triggers are highly unstable, buggy, and monitor logs don't get saved. Woe unto me if I'm using a debug trigger to test a 10 hour procedure and my network drops during that time. I will never be able to restore the monitor output for that run. It's just gone, as is my time and testing results.

I don't understand why I can't discriminate between what gets published to the live factory, and what I can stage for release to higher environments.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MicrosoftDocs/azure-docs/issues/43390#issuecomment-618531528, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN2O6Y7NA36NCTGRVOIBPRTROB2ORANCNFSM4JQUR4NQ.

Thomas-Bailey commented 4 years ago

Hi Thomas. Did I understand you well? Do you have (had) Git integration enabled for other (than dev) environments, like Test and Live? If so, that's not the way you should use git with ADF. All the other things about an analogy to DACPAC and how that works - I agree and I'm all with you. That's why I started working on open-source code, PowerShell module, to do these things. Take a look: https://github.com/SQLPlayer/azure.datafactory.tools

Hi,

No, I only use Git on my dev factory. I'm familiar with the need to promote to test and prod using ARM template deployment. However, that doesn't stop it being clunky as hell and fiddly. Disabling triggers, checking against the ARM template for deprecated pipelines etc is just not a good experience. I've even taken to bastardising the Key Vault for environment variables. This product does some great things but in the 'little' things like environments it has so, so far yet to go.

Thomas-Bailey commented 4 years ago

I couldn't agree more. We have a team of developers all working on multiple feature branches in parallel. We need to be able to publish any or all of those features to our collaboration branch, but selectively move features to higher environments as they are tested/approved. Right now our only option is to work solely in feature branches, and test using debug triggers. Debug triggers are highly unstable, buggy, and monitor logs don't get saved. Woe unto me if I'm using a debug trigger to test a 10 hour procedure and my network drops during that time. I will never be able to restore the monitor output for that run. It's just gone, as is my time and testing results.

I don't understand why I can't discriminate between what gets published to the live factory, and what I can stage for release to higher environments.

IMO the quickest win on this is to permit publish to ADF from more than just the compare branch. So you could have a 'test' branch that can publish to your ADF Test (and you should be able to designate the 'publish' branch per ADF). That with environment variables would solve a lot grief.

Also, does anyone want to explain to me what the hell the point is in adf_publish if it's not supposed to go out of sync with master anyway?

mtvessel commented 4 years ago

Being able to publish from multiple branches might help, but I'd still have multiple developers who need to publish their feature branches to the live factory, without fear that every feature is now going to be promoted to the next environment.

On Thu, Apr 23, 2020 at 4:11 PM Thomas-Bailey notifications@github.com wrote:

I couldn't agree more. We have a team of developers all working on multiple feature branches in parallel. We need to be able to publish any or all of those features to our collaboration branch, but selectively move features to higher environments as they are tested/approved. Right now our only option is to work solely in feature branches, and test using debug triggers. Debug triggers are highly unstable, buggy, and monitor logs don't get saved. Woe unto me if I'm using a debug trigger to test a 10 hour procedure and my network drops during that time. I will never be able to restore the monitor output for that run. It's just gone, as is my time and testing results.

I don't understand why I can't discriminate between what gets published to the live factory, and what I can stage for release to higher environments.

IMO the quickest win on this is to permit publish to ADF from more than just the compare branch. So you could have a 'test' branch that can publish to your ADF Test (and you should be able to designate the 'publish' branch per ADF). That with environment variables would solve a lot grief.

Also, does anyone want to explain to me what the hell the point is in adf_publish if it's not supposed to go out of sync with master anyway?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/43390#issuecomment-618639055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFSQZJKXVCHD7LYV3HZTYDROCOG5ANCNFSM4JQUR4NQ .

-- Michael Taubman email: mtvessel85@gmail.com mobile: 917-670-3945

djpmsft commented 4 years ago

Hi all,

This is Dan from the product group. Just wanted to say I am monitoring this discussion and really appreciating all of your points. Please continue making suggestions and we are brainstorming ways to make things better!

Thanks, Dan

djpmsft commented 4 years ago

Purely brainstorming, what would peoples thoughts be on something like the ability to exclude resources from getting published (pardon the mspaint) ? This is just for selectively deciding what to promote from the collaboration branch into the live factory

mtvessel commented 4 years ago

Hi Dan,

Thanks for the suggestion, but an option to excluding from publishing to the live factory kind of misses the point. We already do that by requiring pull reqs into master. If I want to prevent something from being published I just don't approve the PR. I need to be able to publish whatever I want to the live factory, but selectively choose what goes into the ARM template that CI/CD will use to promote to the next environment. In other words, I need to be able to selectively craft the ARM template for each environment.

Thanks, Mike

On Thu, Apr 23, 2020 at 7:52 PM Daniel Perlovsky notifications@github.com wrote:

Purely brainstorming, what would peoples thoughts be on something like the ability to exclude resources from getting published (pardon the mspaint) ? This is just for selectively deciding what to promote from the collaboration branch into the live factory

[image: image] https://user-images.githubusercontent.com/31044028/80160265-0b4c1680-8582-11ea-9c70-ca859592ca0e.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/43390#issuecomment-618728505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFSQZMWSYUWS2JK6JAL4TDRODIFRANCNFSM4JQUR4NQ .

-- Michael Taubman email: mtvessel85@gmail.com mobile: 917-670-3945

NowinskiK commented 4 years ago

Hey Dan. Appreciate you follow this thread. Agree with @mtvessel. "Exclude" property shouldn't be present in ADF itself. It should be an option for deployment. Please take a look at how it works for database projects (DACPAC). When you compile database project with SSDT in VS - there is a DACPAC file as a output. This is your BUILD Image. Then with sqlpackage.exe application, you can deploy to selected target (SQL) server by comparing DACPAC (image) to target server and generate a differencial script. sqlpackage accepts plenty of parameters, including options what should be excluded or ignored when generating the script. The result of this is T-SQL script to be executed on target server. Analogically, here it would be ARM template. And this is acctually what I'm planning to implement in azure.datafactory.tools.

djpmsft commented 4 years ago

Thanks for responding so quickly! The need is clear and hopefully we can think of some ways for this be well integrated within the tool itself.

djpmsft commented 4 years ago

Regarding the initoal post:

The stance on deployment requirements needs to change, either by modifying the existing CI/CD process to be more functional or by providing another avenue for deploying objects. [Dan]: We are think of ways to improve
The PowerShell and Data Factory product teams need to work more closely together to ensure 100% compatibility. [Dan] We consistently update the pre/post deployment powershell scripts and the instructions have updated to reflect the correct version
Simplify the use or requirements around ARM templates. [Dan] While data factories are built as an ARM template we are trying our best to abstract this concept from the user. While we understand we aren't completely there, continue giving us feedback so we can improve.
Better communication around updates to Data Factory. The release of Data Flow in mid-June broke several PowerShell cmdlets. [Dan] This was unacceptable and not to the standard of the data factory team. We will try to ensure that things like this will not happen again.

Please keep the discussion going here! As the specific doc issues regarding outdated powershell instructions have been fixed, would it be alright if I close this issue?

Thanks, Daniel

RickXMoore commented 4 years ago

Hi Dan,

Let me start off by saying that I’m glad someone from Microsoft directly responded to the post. I’d like to ask for clarification around some of your responses, but in general I’m OK with the issue itself as being closed.

• I’m certainly glad to hear that you are “thinking” of ways to improve, but it’s obvious to me that your customers have varying degrees of how they approach this and some of the complexities that this brings. I think a more open dialog needs to occur between MS and there customers on the needs behind CI / CD deployments and not for MS to just design another solution that doesn’t solve the problems that we face. • I’d like to understand what cmdlets you’re referring to around the pre / post deployment. I’m specifically talking about the AzureDataFactory module and the cmdlets included i.e. xxx-AzDataFactoryV2Pipeline, Dataset, trigger. • I’m sure your customers will continue to do so. • Determine a way that changes to ADF are more broadly distributed, especially around major feature releases like Data Flow so that your customers can be proactive in validating existing processes before they stop working in production environments. As I said at the beginning of the original post, I was asked to post this by my MS account rep and continue to work directly with them on a resolution to this issue.

Thanks again for your continued responses,

Rick.

djpmsft commented 4 years ago

Hey Rick,

First off sorry the delay in response initially, it shouldn't have taken this long to get this strong dialog going between everyone on this thread.

Regarding 'I’d like to understand what cmdlets you’re referring to around the pre / post deployment. I’m specifically talking about the AzureDataFactory module and the cmdlets included i.e. xxx-AzDataFactoryV2Pipeline, Dataset, trigger.' I was referring to the script shared on our ci/cd doc page that we recommend users running before and after deployments. Our Powershell module should be completely up-to-date. Data Flows were not added until they were generally available in November I beleve

In terms of updates, we actively post new releases and upcoming features on the following forums:

ADF Microsoft tech blog
ADF twitter
Release notes in the ADF UX

As i said earlier, keep the discussion going! I will go ahead and close the issue as there are no outstanding doc items

please-close

djpmsft commented 4 years ago

Another thing on upcoming features, as ADF (and now data flows as well) are GA, you can expect there will be no breaking changes to existing pipelines without months and months of warning

RickXMoore commented 4 years ago

Hi Dan,

That's not what I was referring to when discussing the issues with PowerShell between the product teams, but that's OK. I'm sure that needed the update as well.

dgdrake commented 3 years ago

We have experienced the same problem.

What we have done to fix this issue is by no longer merging new feature work into the master branch. All new features would be merged into a "Release" topic branch, which has the same branch policies as master.

For CI/CD, the Azure Release Pipeline in DevOps points to the release branch that we created to deploy.

Once the release has gone into production, then the release branch will need to be merged into the master branch.

Thomas-Bailey commented 3 years ago

We have experienced the same problem.

What we have done to fix this issue is by no longer merging new feature work into the master branch. All new features would be merged into a "Release" topic branch, which has the same branch policies as master.

For CI/CD, the Azure Release Pipeline in DevOps points to the release branch that we created to deploy.

Once the release has gone into production, then the release branch will need to be merged into the master branch.

Ok, but that would suggest you have an ADF that is git enabled but never gets a publish right? Are you ARM template deploying over it or does it never match a branch? How are you handling the param files that are generated with the publish, are you manually crafting these?

MicrosoftDocs / azure-docs

ADF CI/CD and PowerShell #43390

Document Details

please-close