TabularEditor / TabularEditor3

Bug reports, feature requests, discussion and documentation for Tabular Editor 3 (commercial version).
64 stars 7 forks source link

Dataflow download/upload support. #349

Closed gavingc closed 1 year ago

gavingc commented 2 years ago

Discussion for the possibility of enhancement, for downloading/uploading dataflows from/to Power BI Service.

Benefits Tabular Editor (TE) and Power Query Online (Dataflows in Power BI Service) are a natural and powerful fit. TE enables from scratch creation of tabular models and enterprise grade functionality for development and source control. Dataflows provide cloud based ETL. The combination allows simple M language to be used for all data sources in TE, example:

let
  Source = PowerBI.Dataflows(),
  Workspace = Source{[workspaceId = DATAFLOW_WORKSPACE]}[Data],
  Dataflow = Workspace{[dataflowId = DATAFLOW_ID]}[Data],
  Entity = Dataflow{[entity = "Date"]}[Data]
in
  Entity

Features

Current Solutions Currently the above can be achieved with additional manual steps (like remembering to download .json files before a git commit), but it's easy to forget the step. Download/upload is available in Power BI Service web interface, the REST API and PowerShell MicrosoftPowerBIMgmt module. PowerShell examples in the git repo Not sure if C# or other modules/APIs available.

otykier commented 2 years ago

@gavingc I don't understand the upload feature. Could you elaborate a bit more on how you envision this feature would work, and what it's purpose would be?

Regarding downloading of the dataflow JSON file, I didn't realise until now that this file contains all of the metadata necessary to import a table from a dataflow, without having to rely on Analysis Services for updating table schema. The way I see it, this should then be integrated in TE3's Import Table Wizard, such that a user can connect to a workspace, view a list of dataflows and entities within those dataflows, picking those that they'd like to import as tables (I've dismissed this type of request in the past, since I didn't think there was a way for us to get a hold on the entity metadata, but luckily I was wrong).

gavingc commented 2 years ago

Opening the import wizard is actually not strictly required in the workflow, simply create a valid M language table with a partition and the above M code, then right click -> update schema. TE connects to the service (development workspace) and receives all the required metadata that way.

I guess using a .json file of a dataflow with import wizard may be another way (when a development workspace is not available), or as you have describe a convenience of being able to pick tables from existing dataflows.

The download is the 'lower hanging fruit' with good value in the dev/version control workflow (since dataflow development is performed with Power Query Online). Just ensure the downloaded files are encoded as ASCII or UTF-8 :+1: I had a problem (fixable/patchable) with the (older) version of PowerShell v5.1 which defaults to UTF-16 (seen as binary by Git i.e. no diffs).

The upload is not as simple a feature as the download. It must get a list of the dataflow IDs in the service (by name, with aim to create or overwrite), then replace IDs to maintain dependencies/linked entities. PowerShell import workspace example works very well.

otykier commented 2 years ago

I fully agree that we should support updating table schema from Dataflows, and as such we might as well support it in the Import Table Wizard also, since we would need to download the json file either way.

However I still don't understand the upload requirement. To me, it seems as if it is not related to data model development, which is the primary focus of Tabular Editor. Maybe it's just that I don't have a lot of experience with Dataflows, but could you be so kind as to outline exactly the steps that you have in mind, where you see the benefit of having this feature in TE3? Thanks!

gavingc commented 2 years ago

Of course.

How are dataflows related to data model development?

At a high level, incorporating some support for managing (not editing) dataflows may depend on whether Tabular Editor leans towards becoming an integrated development environment (IDE) and by how far. I don't need Git support in TE, which is on the roadmap, any Git client from the command line to Visual Studio Code with extensions can perform that role well - only caveat being that it's not an integrated experience.

Similar with dataflows, except that there are no tools that perform the role of managing dataflows well at this stage.

Dataflows provide connectors for many datasources and the ETL required to develop Tabular Models. Hence bringing the technology of Power Query and Tabular together in Power BI Desktop has been such a success.

For larger data model development, TE + Dataflows means that a data modeller does not need Power BI Desktop, all the data operations can be completed in the cloud, and it reduces dependency on a data engineer with a separate toolset. This makes development of larger data models significantly more agile.

Declan1984 commented 2 years ago

I agree with @gavingc having the ability to edit dataflows from TE3 would be huge improvement. Data flows are great as a step towards data consistency and re-usability across the organization, particularly until composite data models are released. They also help to isolate functionality - handle the connections to sources in the data flow; handle the modeling in the dataset model. From my perspective as well, git integration is not necessary and I would probably avoid using in TE3 - the workspace functionality already seems to work great and I would rather execute git commands from the terminal rather than figure out how to perform git like functions in another software.

gavingc commented 2 years ago

Hi @Declan1984,

Yes, we too are finding all of the benefits of dataflows as you have described.

For clarification. This enhancement request is for managing dataflows - i.e. download / upload. There are only a couple of small manipulations required like setting allowNativeQueries to false during the upload.

My understanding is that support for Power Query itself is not currently possible, due to M language support. Indeed the benefits of editing dataflows in the cloud need to remain in the cloud, that is Power Query Online. Power BI Desktop is the tool for editing Power Query on desktop, but it does not provide dataflow management functions.

FYI - composite models are GA since Nov 2018.

Further info - dataflows downloaded as .json files contain header information, a section for the power query mashup, and sections that describe the entities.

Declan1984 commented 2 years ago

@gavingc I'm not quite sure I follow what you mean when you say download/upload. My thought was to be able to use TE3 to modify the data flow definition (JSON) and upload that into the Power BI cloud service to modify the data flow running there, just like TE3 can modify datasets in Power BI service. Maybe we're talking about the same thing but can you clarify?

Composite models may be supported in Power BI desktop, but it is a preview feature to use datasets hosted in Power BI cloud service as the source for a composite model.

https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-datasets-azure-analysis-services

I found when testing that once a Power BI composite model is created in desktop using 2 or more datasets, it is not possible to modify the tables within the composite model from the source in TE3 (feature request below).

https://github.com/TabularEditor/TabularEditor3/issues/363

samaguire commented 2 years ago

If you're after an Upload/Download feature for Dataflows to make the experience 'better' you should check out this external tool for Power BI Desktop: https://github.com/MarcusWegener/Export2Dataflow

I keep PBIX files in SharePoint for version control and use the external tool to upload directly to the service from Desktop. (Full OAuth2 as well.)

Just my thoughts, but I don't agree that TE3 needs to add this as a functionality: TE3 is about Tabular Models, which Dataflows are not.

gavingc commented 2 years ago

Hi @samaguire,

Thanks for the input and link. It appears that Export2Dataflow very much revolves around editing Power Query in Power BI Desktop with several manual steps to boot, this is exactly the kind of workflow that this request is trying to get away from.

I feel that the main point still revolves around https://github.com/TabularEditor/TabularEditor3/issues/349#issuecomment-1023761358

samaguire commented 2 years ago

Hi @samaguire,

Thanks for the input and link. It appears that Export2Dataflow very much revolves around editing Power Query in Power BI Desktop with several manual steps to boot, this is exactly the kind of workflow that this request is trying to get away from.

I feel that the main point still revolves around #349 (comment)

I'm not sure you're using it correctly then. I just click the export button, enter my credentials, pick my workspace, set the name, and bam I have a new/updated dataflow in the service (unless I've manually written something that is unsupported in Power Query online).

The only way I can think to make this more cloud centric is to actually use Power Query Online to edit the dataflow, which it sounds like your doing. 🤔

So presumably what you're after is an 'easy' method to download the dataflow json files so you can do version control, and your asking TE3 dev team to drop a long awaited Git integration to do this instead? 🤔

On the point about agile dev and large data models, I've had no issue using incremental refresh to create large models without using dataflows. It sounds like your using dataflows to perform M transforms instead of within the model and want the ability to upload the dataflows at the same time as deploying a model. 🤔

In terms of synchronous deployment of dataflows with models, you should get premium (which I highly recommend anyway if dataflows are going to used as a seudo datawarehouse) and use pipelines. DevOps with the Power BI extensions can be used to do an automated and orchestrated deployment through the stages Dev > Uat > Prd of both the dataflow(s) and model(s). Pipelines also has in the roadmap simple change control.

Also, if your serious about the ELT process and reusable data. There is no substitute for a well configured data warehouse, be it Snowflake, Synapse, BigQuery or other. You can do things in the data warehouse that you just can't do with Power Query.

gavingc commented 2 years ago

So presumably what you're after is an 'easy' method to download the dataflow json files so you can do version control

Correct, there is value in that.

, and your asking TE3 dev team to drop a long awaited Git integration to do this instead? 🤔

No, I was addressing the 'how is this related to data model development' question. My argument was that the same logic applies for both cases i.e. how far towards an integrated development experience Tabular Editor may provide. In addition, I provided our experience of the value proposition of each, this discussion has now seen views on both sides of that.

It sounds like your using dataflows to perform M transforms instead of within the model and want the ability to upload the dataflows at the same time as deploying a model.

Yes and no, M is an ETL technology and Tabular is not. Editing of dataflows is performed with Power Query Online, the quick win is downloading and saving the dataflows with the model for version control. Uploading is the natural next step, to deploy the whole project to a workspace.

Pipelines also has in the roadmap simple change control.

Simple being the operative word here, even chance that it's SharePoint backed which is what we are already using for .pbix files.

Using Deployment Pipelines and Azure DevOps, still need developer tools.

Also, if your serious about the ELT process and reusable data. There is no substitute for a well configured data warehouse, be it Snowflake, Synapse, BigQuery or other. You can do things in the data warehouse that you just can't do with Power Query.

Microsoft seems to think there is a market for dataflows, the organisation that I work with agrees.

samaguire commented 2 years ago

No, I was addressing the 'how is this related to data model development' question.

You should really check the MS documentation on where they see this as fitting. (https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-introduction-self-service)

image

Using Deployment Pipelines and Azure DevOps, still need developer tools.

PBI desktop is a developer tool. TE3 is a developer tool (in fact it is an advanced tabular model development environment).

Microsoft seems to think there is a market for dataflows, the organisation that I work with agrees.

I don't disagree. I remember playing with it when it first arrived on the scene, hailed as the answer to every enterprise's data warehousing needs. Self-service to boot. I was excited. The common data model concept was 'cool'. The reality is that Dataflows are just a simplified way for power user analysts to recycle common logic and as such prevent unnecessary load on the ERP system when 101 different datasets all grab the same (or similar) data.

but, back to the point:

Yes and no, M is an ETL technology and Tabular is not. Editing of dataflows is performed with Power Query Online, the quick win is downloading and saving the dataflows with the model for version control.

As you've implied yourself, Dataflows are not Tabular related, they are ETL related (they are a pseudo data warehouse databases). And you're asking the Tabular modelling tool to handle them? 🤔

To me, it seems as if it is not related to data model development, which is the primary focus of Tabular Editor

I agree with Daniel. The fact that TE can edit M script within a tabular model doesn't mean we should make TE work with other tools that use the Power Query engine. If this logic is extrapolated, we should also be able to download the DDL for a source view in a SQL DB just because TE is also able to edit SQL statements. Or to a greater extreme, should TE be pulling the M script from an ADF transform because the transform generates the source table used in a tabular model?

I think you might better spend your time writing some PowerShell scripts to execute your sync with the Service (both model and dataflow) than labouring the point here. That way you can get the version control you're after (which would be beyond the target audience for Dataflows) in a way that works with your workflow.

otykier commented 2 years ago

Thanks to everyone for chiming in on this discussion. For now, our position is that management of PBI Dataflows is outside the scope of Tabular Editor, with the exception of importing tables to a model from an existing Dataflow, which will come in the next round of updates we're working on for the Import Table Wizard.

gavingc commented 2 years ago

No worries, thanks for the update.