codecutout / terraform-provider-powerbi

Terraform Power BI Provider
MIT License
14 stars 16 forks source link

Add rebind report to dataset functionality #48

Closed alxy closed 3 years ago

alxy commented 3 years ago

Introduction

We are very often in a situation where we want to to separate PowerBI datasets and reports into two different files, as the datasets are generic and are often reused by multiple reports. Currently, if we upload a report using this provider it is linked to the dataset it was last linked to for development purposes. We are then using some custom Powershell script to do the rebinding - however, that fails very often and terraform cannot manage the state of this rebinding operation. Thus, this PR will add proper rebinding functionality to the provider itself.

Example code

resource "powerbi_workspace" "example" {
  name = "Example Workspace"
}

# Deploy the dataset first ...
resource "powerbi_pbix" "example_dataset" {
  workspace_id = powerbi_workspace.example.id
  name         = "My dataset"
  source       = "data/Datasets/Dataset.pbix"
  source_hash  = filemd5("data/Datasets/Dataset.pbix")
  skip_report  = true # Only deploy the dataset
}

# ... and then connect it to a report
resource "powerbi_pbix" "example_report" {
  workspace_id = powerbi_workspace.example.id
  name         = "My report"
  source       = "data/Reports/Report.pbix"
  source_hash  = filemd5("data/Reports/Report.pbix")
  dataset_id   = powerbi_pbix.example_dataset.dataset_id # Bind the report to the dataset
  skip_dataset = true                                    # Only manage the report (this is required for the delete operation)
}

# add more reports here and bind them to the same dataset

Result in the portal is the following: image

Caveats

I needed to introduce an additional argument skip_dataset specifically for the delete operation. The reason is that if the dataset is managed by another resource (so a separate powerbi_pbix as in the example above), the delete operation should not delete the dataset connected to the report, as its state is managed by the other resource. It would be nice if there was a way to tell if a Computed attribute has indeed been computed or has been set by the configuration. Not sure if this is possible, as this is my first Go contribution.

Testing

As this is my very first experiment with Go, I was not exactly able to make an acceptance test work, but I'm happy to help doing that, however, I will probably need to support to get started.

My manual tests went pretty good. I have tested the following:

alex-davies commented 3 years ago

@alxy this is looking really good! I didnt know they had a rebind API method now! This was something I had actually been looking for to help out with push datasets

Im not sure I like having the skip_dataset, mostly because it doesn't skip the dataset at all, at least not in the same way as skip_report does. Although I can see why you needed it. Perhaps it might be better to have a new property rebind_dataset_id. dataset_id will always be the id of what was deployed in the import and not settable. But the rebind_dataset_id is run after import, in a similar way to how the parameters and datasources are run

We need to get some tests in this. There are already some tests that do something similar (https://github.com/codecutout/terraform-provider-powerbi/blob/03419ba599c43c7c27f32d4ba0ec50e0b355baf3/internal/powerbi/resource_pbix_test.go#L104) these tests do it in a more hacky way by rewriting the pbix file itself, which was always fragile but until now didn't know an alternative. YOu can either rewrite those tests to use the rebind, or copy those tests and do something similar

I don't have easy access to my dev machine till later in the week, If you were unable to get the tests going by then I can take a stab at them

alxy commented 3 years ago

I'll have another look on the tests later probably or on weekend.

I think if we add another property like rebind_dataset_id, state handling becomes a bit more difficult, as changing this will also update the dataset_id, but maybe this is even a good and predictable behavior. I'll try that as well. I have asked in the hashicorps forum as well if it is possible to determine if a computed property has been set from code or from config.

alex-davies commented 3 years ago

I think if we add another property like rebind_dataset_id, state handling becomes a bit more difficult, as changing this will also update the dataset_id

Im not entirely sure it will. The dataset_id is fetched from the import, not from the report. Id have to double check but I wouldnt think the import's dataset_id would change if we update the report's dataset_id

alex-davies commented 3 years ago

Just tested what happens to the import

Assume we have two pbix/imports

after rebinding (r2) to (d1), the imports do indeed get modified, but it is the report that gets changed not the dataset, so we end up with

This seems like an odd behaviour, and not entirely sure where that leaves us

alxy commented 3 years ago

Are you sure that this is the case?

In your example d2 would be somewhat orphaned it looks like, but as you cant upload a report without a dataset in PBI portal, it looks like there is no way around that. For us its not that much of a big deal, since we have a separate workspace for development (where all users manually using the UI) and all reports are linked to datasets in that workspace when initially deployed by Terraform (to a dedicated workspace completely managed by TF). We then deploy the dataset to that workspace as well, and do the rebinding. This leaves us with a clean workspace with datasets and reports correctly linked and managed by TF.

I think there are two cases which can be considered here:

  1. Uploading a report where the dataset doesn't exist. In this case, one is automatically created with the same name as the report.
  2. Uploading a report linked to an existing dataset. In that case the report is already bound to that dataset.

In my experiments with setting the dataset_id only the report was updated to use the new dataset, and not the dataset was changed to include the new report - which is what your example suggests. So I quite like the idea of using rebind_dataset_id now, as it also allows us the do the rebinding (after the rebinding dataset_id = rebind_dataset_id) and reliably tell if we should delete the dataset on report deletion (if rebind_dataset_id has not been set) or not. The only issue I see is that we will probably end up with an orphaned dataset belonging to the report (with the same name as the report), id the report has not been bound to another, already existing dataset (see our use case above).

alex-davies commented 3 years ago

Are you sure that this is the case? In your example d2 would be somewhat orphaned it looks like, but as you cant upload a report without a dataset in PBI portal, it looks like there is no way around that

Im referring to PowerBI imports (https://docs.microsoft.com/en-us/rest/api/power-bi/imports), Currently a pbix resource is more analogous to an import than it is to a report (the pbix resource id is the import id not the report id). I had assumed imports don't change much after being uploaded. However it seems a rebind changes the import as well as the report, so that assumption turned out to be incorrect. Additionally it changes the import by moving the report to a different import rather than moving the dataset which is what one would intuitively expected

The dataset_id and report_id were intended to be the original uploaded items (which is why the docs description is "The ID for the dataset that was deployed as part of the PBIX."), that was because they were sourced from the import, which I had assumed didnt change

alex-davies commented 3 years ago

I have pushed a change to add rebind_dataset_id and a test case around it. Change also includes some retry mechanisms so the tests can pass slightly more reliably.

I had to get around the changing of the import by capturing dataset_id and report_id on creation then never updating from the import again. This allows us to capture the original report and dataset separately from what it was rebound to. I'm still a bit uneasy about never refreshing this data, and feel it may behave weirdly in some scenarios, but the existing tests are all passing

Take a look at my changes and feel free to add more test cases if you think there are cases you don't think are covered

alxy commented 3 years ago

Looks good to me, @alex-davies :) Will try your implementation also with my reports on weekend.