12Knocksinna / Office365itpros

Office 365 for IT Pros PowerShell examples
MIT License
1.3k stars 570 forks source link

GetGraphUserStatisticsReport.PS1 Line 84 - Item has already been added. Key in dictionary #8

Closed bbrownchg closed 4 years ago

bbrownchg commented 4 years ago

When gathering sign-in data the script is choking on gathering the extra data:

Exception calling "Add" with "2" argument(s): "Item has already been added. Key in dictionary: 'user@example.com' Key being added: 'user@example.com'" At Office365UsageReport.ps1:84 char:11

jvondermans commented 4 years ago

I have the same issue as @brandonbrownchenega It looks like the script wants to add the first 999 users again.

bbrownchg commented 4 years ago

Looks like the script has been updated with a fix, all working for me now!

jvondermans commented 4 years ago

I still have the same issue, the update didn't work for me, unfortunately.

I think I know what the problem is, but I don't know how to solve it.

The user signing data is fetched with the top 999, this works well. but when fetching the others (if there are any) is starts with the $NextLink = $SignInData.'@Odata.NextLink' In the @odata.nextlink part the &top999 is also added, so it starts adding them again.

Hopefully, this can be altered, because otherwise this is a great script to get insight in you tenant

bbrownchg commented 4 years ago

Hmm, that's really odd because it's been working great for me. Are you getting the same error as V1 of the script, or a different one?

jvondermans commented 4 years ago

I'm getting the same error as with V1. it doesn't matter if I run it in ISE or Powershell (and VSCode). Also run as admin doesn't matter, I keep getting the error. If I comment that part out, I will get a nice result. Only the login info is only available for 999 people, and the rest gets a N/B.

bbrownchg commented 4 years ago

I'm running in to an issue now where the data is taking forever to process after being pulled down from Graph. Previously the script was taking just shy of an hour to run for ~3400 accounts and now it's looking it's going to take 2-3 days. Can't tell if that's me getting throttled on the Graph side or a local processing issue.

SteveBurkettNZ commented 4 years ago

I'm seeing the "Item has already been added. Key in dictionary:" error when running the script on a machine that hasn't completed the first run of Internet Explorer (e.g. on a brand new virtual machine).

The Invoke-WebRequest command on Line 80 throws up an error immediately beforehand: Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.

A fix is to add the -UseBasicParsing parameter to the end of the Invoke-WebRequest command on Line 80, similar to what's on the Invoke-WebRequest command on Line 24.

I've added a Pull Request with that update.

jvondermans commented 4 years ago

Hi @SteveBurkettNZ ,

Thanks for your comment, this works for me!!

SteveBurkettNZ commented 4 years ago

I'm running in to an issue now where the data is taking forever to process after being pulled down from Graph. Previously the script was taking just shy of an hour to run for ~3400 accounts and now it's looking it's going to take 2-3 days. Can't tell if that's me getting throttled on the Graph side or a local processing issue.

It's a local processing issue, where it's running the ForEach ($U in $Users) loop, repeatedly searching through the $Report array for relevant entries for that user. It's slow, taking just over 37 minutes per 1000 users for me on a 2 vCPU VM (Azure VM Standard D2s v3).

I've modified to use a DataTable which is much more efficient for searching larger datasets, and managed to get it down to 2 minutes 11 seconds per 1000 users. I've just found though that my dataview filter chokes if the user has an apostophe in their name (darn you Maricel D'Souza!) so I need to fix first. :(

12Knocksinna commented 4 years ago

I've committed V1.2.

I played around with replacing the List object used for $Report with a DataTable object, but the performance fix is really much simpler.

$Report contains all the records found from the Graph. The performance hit comes from scanning what can be a very large list to find records for a user, which happens six times (once per workload) in V1.1. I inserted a line to extract the data for the user being processed (usually six records):

$UserData = $Report | ? {$_.UPN -eq $U} # Extract records from list for the user

And now the six checks are performed against $UserData instead of the complete $Report list. This speeds things up quite a lot (it would be interesting if someone with a really big domain could test).

12Knocksinna commented 4 years ago

BTW, I decided to go with hash tables to speed up performance. You can test things out with the V2 version of the script.

Turbocharging the Analysis of Office 365 Data with PowerShell Hash Tables

PowerShell hash tables are very efficient at retrieving data, which is just what’s needed when thousands of Office 365 accounts need processing. Our script to analyze usage data extracted from the Microsoft Graph was turbo-charged when we replaced list objects with hash tables, all of which makes it much easier to identify underused Office 365 accounts and save some money on licensing spend.

https://office365itpros.com/2020/09/14/speed-powershell-hash-tables-office365-data/

bbrownchg commented 3 years ago

Thanks for this! All my problems with slow information processing have been solved now.

SteveBurkettNZ commented 3 years ago

Yup, v2 working a treat:

Statistics for Graph Report Script V2.0

Time to fetch data from Microsoft Graph: 3:5 Time to prepare date for processing: 0:7 Time to create report from data: 5:12 Total time for script: 8:25 Total accounts processed: 26808 Accounts processsed per minute: 3184.71