JulianHayward / Azure-MG-Sub-Governance-Reporting

Azure Governance Visualizer aka AzGovViz is a PowerShell script that captures Azure Governance related information such as Azure Policy, RBAC (a lot more) by polling Azure ARM, Storage and Microsoft Graph APIs.
MIT License
853 stars 306 forks source link

Push AzGovViz output to repository fails when PSRule.csv exceeds 100 MB #121

Closed extmaper closed 1 year ago

extmaper commented 2 years ago

When -DoPSRule is used and the PSRule.csv file size exceeds 100 MB the workflow fails.

remote: error: File wiki/AzGovViz_***_PSRule.csv is 100.65 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./

JulianHayward commented 2 years ago

so in case the csv exceeds this limit we could remove the description column from the output - as this field is quite large.

Thoughts?

CSV columns:

extmaper commented 2 years ago

I think it would be ok to remove the description column as proposed.

JulianHayward commented 2 years ago

please try v6_major_20220717_1

JulianHayward commented 2 years ago

@extmaper works for you?

JulianHayward commented 2 years ago

closing / no response

extmaper commented 2 years ago

@JulianHayward - Sorry for the late reply, I have been on vacation. The implemented change for 100 MB file works as expected. Thanks for fast update.

cajohanikea commented 1 year ago

@JulianHayward we get this error again: remote: error: File wiki/AzGovViz_***.csv is 103.71 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./

What other optimization's can be done to make it work? Running version 6.3.0

JulianHayward commented 1 year ago

@cajohanikea which file is it this time?

cajohanikea commented 1 year ago

@JulianHayward The log says this "File wiki/AzGovViz_***.csv", The file that is generated after the scan of the environment that will be uploaded to the app service.

The parameters we are using is: -LargeTenant -NoPIMEligibility -GitHubActionsOIDC

cajohanikea commented 1 year ago

The file is called AzGovViz_managementgroupid.csv

What do you say about using Git Large File? - https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage

JulianHayward commented 1 year ago

git large file - please give it a try and share if/how it worked. I guess there are more files close to the 100MB limit? which are the next largest files?

some ideas..

cajohanikea commented 1 year ago

The files after that are:

AzGovViz_managementgroupid_PSRule.csv - 77 MB AzGovViz_managementgroupid_RoleAssignments.csv -Approx: 40 MB AzGovViz_managementgroupid_DefinitionInsights.html - Approx: 30 MB

I don't think the PSRule file is used anymore since we don't specify the parameter -DoPSRule on script execution?

For compression do you mean using tool like gzip and how do you mean with "loss of readable git history"? How will the compression impact the available data in azgovviz?

JulianHayward commented 1 year ago

@cajohanikea please check the branch issue122next. Check pipeline updates (the cleanup is uncommented..)

cajohanikea commented 1 year ago

@JulianHayward The way I understand the solution is that csv files are used for change tracking within GH and the HTML files is sent to App service and used in the solution.

Since the CSV files does not affect the solutions functionality in the App service but the html files would. Should the clean-up focus on csv files only?

In addition, what if the change tracking files instead of being check-in is sent to for example a storage account? Then there would not be any kind of file size limitations to consider, and the user would have two options for change tracking in larger environments - delete or keep on external system.

Maybe Im misunderstanding, please let me know.

JulianHayward commented 1 year ago

@cajohanikea please ping me on linkedId / let´s discuss scenarios/dependencies

cajohanikea commented 1 year ago

@JulianHayward The feature in branch issue122next solves the problem by removing the AzGovViz***.csv file. For us other files are not close to the limit - AzGovViz****_RoleAssignments.csv at 48 MB (We are not using -PSrule param and use param -Largetenant)

Run Write-Host "Checking files in $($env:OutputPath) for GitHub 100MB file size limit" Checking files in wiki for GitHub 100MB file size limit Ref: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#file-size-limits Found total of 23170 files Found 1 files hitting the GitHub file size limit File 'AzGovViz_***.csv' size 104.378873825073MB exceeds the GitHub 100MB file size limit - removing file /home/runner/work/******/******/wiki/AzGovViz_***.csv

JulianHayward commented 1 year ago

@cajohanikea merged - thanks!