bcgov / cloud-pathfinder

This is the technology and UX backend repo for the cloud pathfinder ZenHub task board
https://app.zenhub.com/workspaces/cloud-pathfinder-5e4dbb426c3c6af8dcbf06a7/board?repos=241742911
Creative Commons Zero v1.0 Universal
2 stars 8 forks source link

Azure data warehouse support - Ticket resolution w Microsoft (ref. EDW - ADF usage) #1929

Closed jon-mc-git closed 1 year ago

jon-mc-git commented 2 years ago

Describe the issue Azure Data Factory service support ticket with Microsoft to help either fix their service or provide an workaround internal to Azure

TLDR: Azure Data Factory is an ETL/Job Runner, a data transfer and transformation service to ship data from one place to another.

Additional context The Azure Data Factory service has a s/w module that helps client reach their internal data\DB sources via standard networking back to their on-prem data center resources. In many cases the purpose for this is to create periodic 'jobs' that capture real-time data sources\live data, transform it according job rules through the IR coordinated process and output the results to a repo or location where further, possibly BI-type processes, analysis is done (into Azure Databrickes for example) and\or stored\streamed to somewhere for final reporting (frontend web app, PowerBI, etc.)

In our case our Gov Azure infrastructure (Test\Dev\PoC\Non-prod) by default restricts direct Internet use from inside Azure (internal networking and inter-service private links only) - Equivalent in principal to on-prem zone B. All Internet and IP-based resources having destinations outside of our Gov Azure managed infrastructure get routed automatically down to on-premise - whereby internal traffic is re-routed to its local destination (if fw rules there allow it) and, unless otherwise allowed by proxy, external Internal calls are ignored.

To process these normally ignored Azure VM Internet calls and allow them through in the future will take the use of either direct Azure-based direct-to-Internet routing set up (whereby they can be captured by MS and re-routed to service internally, or force traffic to outbound to Internet on-prem. There are several difficulties to using on-prem as the outbound point and the use of a proxy there so a workaround with Microsoft is seen as the optimal option for standardized Azure Service calls back to their internal service endpoint.

In the context here, we either find a re-route solution with Microsoft as the simplest method, or we'll have to consider creating an on-prem proxy solution for these standard Cloud Services outbound calls. We could consider as well using the on-prem F5 Forward Proxy for this Azure traffic but the service is whitelist based and would require client knowledge of all Azure service possible outbound calls beforehand, and constant inquiries\tickets with CPF. This would require extensive testing and knowledge by Ministry client, ending in long inexact iStore orders every time, which CPF would like to avoid if at all possible for it clients.

Service issue involved here is with Azure Data Factory - Self Hosted IR (Integrated Runtime) - SHIR and although configured to communicate internally, it is still making internal IR function calls to external Servicebus endpoints.

Definition of done

jon-mc-git commented 2 years ago

On hold - Waiting for MS to confirm which UDR forced routing rules are being triggered and\or if there is a more optimal configuration. Once we know this then question is why there is no response to the SH-IR 'Interactive Authoring' public URL calls from the Azure VM (left on) or why it is not responding. Will put in Blocked until I get word back from them on this

jon-mc-git commented 2 years ago

Unblocked now as MS has a specialist that will do packet tracing to analyze routing traffic, and where the DataFactory ServiceBus calls (URL based) are going and why there is no response from the service.

jon-mc-git commented 1 year ago

Met with MS again a couple times and currently doing some troubleshooting back and forth. Waiting on MS currently for latest config changes to implement for testing

ActionAnalytics commented 1 year ago

Jon has reengaged with the team to try prove out the network connectivity

jon-mc-git commented 1 year ago

Solution proposed by MS is a temporary one that proposed leaving open independent vNet IP subnet ranges (ones created but not attached to a UDR route policy that forces its traffic to the Hub/Gateway) to direct access to the Internet, instead of first passing through the Hub fw. This will be considered if absolutely necessary but is not an optimal solution, as normally our preference is to route all traffic to the Hub fw first. We may have to discuss solution on SecOp's Hub fw to allow certain traffic from specific IP's out to the Internet from there, but from what i understand from SecOps this would mean re-factoring somewhat what they have in place now. Going to see if we can get UDR working properly as first optimal solution method and will go from there

ActionAnalytics commented 1 year ago

Jon says Tran turned away from this work for now. Please put on pause until we get caught up on other tickets and then we can refine as another ticket in future refinement/roadmapping.

There is a ticket on the Microsoft side and Jon says he'll shut this down. We agree to "close" this by putting it back in New Items as a reminder to review it later.

jon-mc-git commented 1 year ago

Tie off meeting was completed with Microsoft support and TRAN present. The specific internal DataFactory service trying still to connect to the Internet was not able to be identified (to create a specific UDR [forced route rule]). We could also not implement the MS solution suggested due to restrictions on internal servers not allowed to make general calls to the Internet. It was also found that the additional suggestion of setting the rule to apply solely to the TRAN subnet was not possible. So, only a proxy solution would be a workable alternative, which we do not have currently available. This was clarified with TRAN and that they could work get on-prem data extracted with how the IR tool functioned still but not with the full functionality in the meantime.

wrnu commented 1 year ago

old ticket, closing