CrowdStrike / psfalcon

PowerShell for CrowdStrike's OAuth2 APIs
The Unlicense
353 stars 66 forks source link

[ BUG ] `Get-FalconVulnerability` stuck in endless loop #331

Closed guy-user closed 1 year ago

guy-user commented 1 year ago

Describe the bug When calling Get-FalconVulnerability the cmdlet sometimes goes into an infinite loop. I first saw this issue occur on the 13 July and have been having intermittent issues ever since. I had been running v2.2.5 for at least 2 months prior without issue. No significant changes on my side to account for this.

This is a major problem as I cannot present a proper vulnerability dataset to the business outside of the Falcon console.

I have also been having issues recently with incorrect vulnerability results when querying across the full vulnerability dataset, but I do not know if it is related to this problem or not.

To Reproduce I cannot reproduce the issue on demand as it occurs intermittently, but I do have a live debugger session with the issue currently active, and I will leave this session active to allow you to collect whatever you need (if required), or to do a zoom session so you can see it for yourself (if that helps).

From the debugger I can see that the script is looping in the Invoke-Loop function in private.ps1. This is an extract of lines 553-594 from private.ps1

    function Invoke-Loop ([hashtable]$Splat,[object]$Object,[int]$Int) {
        do {
            # Determine next offset value
            [string[]]$Next = if ($Object.after) {
                @('after',$Object.after)
            } elseif ($Object.next_token) {
                @('next_token',$Object.next_token)
            } elseif ($null -ne $Object.offset) {
                $Value = if ($Object.offset -match '^\d{1,}$') { $Int } else { $Object.offset }
                @('offset',$Value)
            }
            if ($Next) {
                # Clone parameters and make request
                $Clone = Set-LoopParam $Splat $Next
                if ($Script:Falcon.Expiration -le (Get-Date).AddSeconds(60)) {
                    if ($PSCmdlet.ShouldProcess('Request-FalconToken','Get-ApiCredential')) {
                        # Refresh authorization token when required
                        Request-FalconToken
                    }
                }
                [string]$Target = New-ShouldMessage $Clone.Endpoint
                if ($PSCmdlet.ShouldProcess($Target,$Operation)) {
                    $Script:Falcon.Api.Invoke($Clone.Endpoint) | ForEach-Object {
                        if ($_.Result.Content) {
                            # Output result, update pagination and received count
                            $Object = (ConvertFrom-Json (
                                $_.Result.Content).ReadAsStringAsync().Result).meta.pagination
                            Write-Request $Clone $_ -OutVariable Output
                            [int]$Int += ($Output | Measure-Object).Count
                            if ($null -ne $Object.total) {
                                Write-Log $Command "Retrieved $Int of $($Object.total)"
                            }
                        } elseif ($null -ne $Object.total) {
                            [string]$Message = "[$Command] Total results limited by API '$(
                                ($Clone.Endpoint.Path).Split('?')[0] -replace $Script:Falcon.Hostname,
                                $null)' ($Int of $($Object.total))."
                            $PSCmdlet.WriteError($Message)
                        }
                    }
                }
            }
        } while ($null -ne $Object.total -and $Int -lt $Object.total)

This is the value of $object whilst in that loop is as follows. (after is blank) $object limit total after 400 1450

And this is the value for $Int $Int=1445

As you can see there is no way out of the do loop.

Here is the VERBOSE output leading up to the loop: VERBOSE: 23:19:57 [Get-FalconVulnerability] /spotlight/combined/vulnerabilities/v1:get VERBOSE: 23:19:57 [ApiClient.Invoke] GET https://api.crowdstrike.com/spotlight/combined/vulnerabilities/v1?filter=updated_timestamp:>'2023-06-07T16:23:22Z'%2Bupdated_timestamp:<'2023-06-07T17:23:23Z'%2Bstatus:['open','reopen']&limit=400&facet=remediation&fac et=cve&facet=evaluation_logic&facet=host_info VERBOSE: 23:19:57 [ApiClient.Invoke] Accept=application/json VERBOSE: 23:19:59 [ApiClient.Invoke] 200: OK VERBOSE: 23:19:59 [ApiClient.Invoke] Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=15724800; includeSubDomains, max-age=31536000; includeSubDomains, X-Cs-Region=us-1, X-Cs-Traceid=98030355-89f2-4cd2-aedb-28f8b5b4dd76, X- Ratelimit-Limit=6000, X-Ratelimit-Remaining=5906, Date=Tue, 18 Jul 2023 13:19:58 GMT, Server=nginx VERBOSE: 23:20:00 [Write-Result] query_time=0.712029221, pagination.limit=400, pagination.total=1450, pagination.after=WzE2ODYxNTU1OTcwMDAsIjk4YTAwZmQ2YzRmNDQwNmRhY2IyN2MxMTY1NTAzNTM2Xzg0YWJlODIyZmRlOTM3ZWVhMjBlNWM5Y2E0ZjY2ZDg5Il0=, powered_by=spapi, trace_i d=98030355-89f2-4cd2-aedb-28f8b5b4dd76 VERBOSE: 23:20:00 [Get-FalconVulnerability] Retrieved 400 of 1450 VERBOSE: 23:20:00 [ApiClient.Invoke] GET https://api.crowdstrike.com/spotlight/combined/vulnerabilities/v1?filter=updated_timestamp:>'2023-06-07T16:23:22Z'%2Bupdated_timestamp:<'2023-06-07T17:23:23Z'%2Bstatus:['open','reopen']&limit=400&facet=remediation&fac et=cve&facet=evaluation_logic&facet=host_info&after=WzE2ODYxNTU1OTcwMDAsIjk4YTAwZmQ2YzRmNDQwNmRhY2IyN2MxMTY1NTAzNTM2Xzg0YWJlODIyZmRlOTM3ZWVhMjBlNWM5Y2E0ZjY2ZDg5Il0= VERBOSE: 23:20:00 [ApiClient.Invoke] Accept=application/json VERBOSE: 23:20:02 [ApiClient.Invoke] 200: OK VERBOSE: 23:20:02 [ApiClient.Invoke] Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=15724800; includeSubDomains, max-age=31536000; includeSubDomains, X-Cs-Region=us-1, X-Cs-Traceid=ea48a40f-79f4-416b-834b-82f1a588e0b1, X- Ratelimit-Limit=6000, X-Ratelimit-Remaining=5905, Date=Tue, 18 Jul 2023 13:20:01 GMT, Server=nginx VERBOSE: 23:20:03 [Write-Result] query_time=0.652906495, pagination.limit=400, pagination.total=1450, pagination.after=WzE2ODYxNTY5MTAwMDAsIjcwMjY5NzQ3YzlmNTQ3ZGRhMzQxZWE0YjUwYjFjYzk2X2FmZDQ3ZGI5OGFlNzNlMGI5MjExYjJkZTEzNDhiODE1Il0=, powered_by=spapi, trace_i d=ea48a40f-79f4-416b-834b-82f1a588e0b1 VERBOSE: 23:20:03 [Get-FalconVulnerability] Retrieved 800 of 1450 VERBOSE: 23:20:04 [ApiClient.Invoke] GET https://api.crowdstrike.com/spotlight/combined/vulnerabilities/v1?filter=updated_timestamp:>'2023-06-07T16:23:22Z'%2Bupdated_timestamp:<'2023-06-07T17:23:23Z'%2Bstatus:['open','reopen']&limit=400&facet=remediation&fac et=cve&facet=evaluation_logic&facet=host_info&after=WzE2ODYxNTY5MTAwMDAsIjcwMjY5NzQ3YzlmNTQ3ZGRhMzQxZWE0YjUwYjFjYzk2X2FmZDQ3ZGI5OGFlNzNlMGI5MjExYjJkZTEzNDhiODE1Il0= VERBOSE: 23:20:04 [ApiClient.Invoke] Accept=application/json VERBOSE: 23:20:05 [ApiClient.Invoke] 200: OK VERBOSE: 23:20:05 [ApiClient.Invoke] Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=15724800; includeSubDomains, max-age=31536000; includeSubDomains, X-Cs-Region=us-1, X-Cs-Traceid=f6a88e83-34ba-4cf9-afe9-2aae7fe7ebe6, X- Ratelimit-Limit=6000, X-Ratelimit-Remaining=5907, Date=Tue, 18 Jul 2023 13:20:04 GMT, Server=nginx VERBOSE: 23:20:07 [Write-Result] query_time=0.755064774, pagination.limit=400, pagination.total=1445, pagination.after=WzE2ODYxNTc0NjkwMDAsImRkODkyY2ZjMjBmOTQ1NWE4NWNiN2RiMDZmNzZkNGQ1XzQ1ZWY0M2FjOTRhZTM1NWM5Y2RhMGMwMjljZWYzYzE2Il0=, powered_by=spapi, trace_i d=f6a88e83-34ba-4cf9-afe9-2aae7fe7ebe6 VERBOSE: 23:20:07 [Get-FalconVulnerability] Retrieved 1200 of 1445 VERBOSE: 23:20:07 [ApiClient.Invoke] GET https://api.crowdstrike.com/spotlight/combined/vulnerabilities/v1?filter=updated_timestamp:>'2023-06-07T16:23:22Z'%2Bupdated_timestamp:<'2023-06-07T17:23:23Z'%2Bstatus:['open','reopen']&limit=400&facet=remediation&fac et=cve&facet=evaluation_logic&facet=host_info&after=WzE2ODYxNTc0NjkwMDAsImRkODkyY2ZjMjBmOTQ1NWE4NWNiN2RiMDZmNzZkNGQ1XzQ1ZWY0M2FjOTRhZTM1NWM5Y2RhMGMwMjljZWYzYzE2Il0= VERBOSE: 23:20:07 [ApiClient.Invoke] Accept=application/json VERBOSE: 23:20:08 [ApiClient.Invoke] 200: OK VERBOSE: 23:20:08 [ApiClient.Invoke] Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=15724800; includeSubDomains, max-age=31536000; includeSubDomains, X-Cs-Region=us-1, X-Cs-Traceid=de19257c-6511-4e8e-a189-7b8a5939c166, X- Ratelimit-Limit=6000, X-Ratelimit-Remaining=5906, Date=Tue, 18 Jul 2023 13:20:08 GMT, Server=nginx VERBOSE: 23:20:09 [Write-Result] query_time=0.46233931, pagination.limit=400, pagination.total=1450, pagination.after=, powered_by=spapi, trace_id=de19257c-6511-4e8e-a189-7b8a5939c166 VERBOSE: 23:20:09 [Get-FalconVulnerability] Retrieved 1445 of 1450

It would seem that its only retrieved the final 245 events (1201-1245), not the full remaining 250, and has set pagination.after= to null, so its left 5 events hanging, and I am guessing this is leading to the infinite loop condition.

Expected behavior No infinite looping please :-)

Environment (please complete the following information): OS Name: Microsoft Windows Server 2016 Standard OS Version: 10.0.14393 N/A Build 14393 PowerShell: 5.1.14393.5582 PSFalcon: 2.2.5

Additional context Add any other context about the problem here.

Transcript content Haven't been able to generate a transcript.

bk-cs commented 1 year ago
VERBOSE: 23:20:08 [ApiClient.Invoke] Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=15724800; includeSubDomains, max-age=31536000; includeSubDomains, X-Cs-Region=us-1, X-Cs-Traceid=de19257c-6511-4e8e-a189-7b8a5939c166, X-
Ratelimit-Limit=6000, X-Ratelimit-Remaining=5906, Date=Tue, 18 Jul 2023 13:20:08 GMT, Server=nginx
VERBOSE: 23:20:09 [Write-Result] query_time=0.46233931, pagination.limit=400, pagination.total=1450, pagination.after=, powered_by=spapi, trace_id=de19257c-6511-4e8e-a189-7b8a5939c166

Thanks for the report!

This verbose output shown here is displaying what the API returned, meaning that PSFalcon isn't setting after to null, the API is. My gut response says the problem lies within the API, and not what PSFalcon is doing--other than maybe needing to implement a "if not done but still trying, stop if results aren't increasing" break.

I'm investigating...

bk-cs commented 1 year ago

I noticed that your pagination.total also changes during this loop (1450, 1450, 1445, 1450), giving more evidence that this is API related:

VERBOSE: 23:20:00 [Write-Result] query_time=0.712029221, pagination.limit=400, pagination.total=1450, pagination.after=WzE2ODYxNTU1OTcwMDAsIjk4YTAwZmQ2YzRmNDQwNmRhY2IyN2MxMTY1NTAzNTM2Xzg0YWJlODIyZmRlOTM3ZWVhMjBlNWM5Y2E0ZjY2ZDg5Il0=, powered_by=spapi, trace_id=98030355-89f2-4cd2-aedb-28f8b5b4dd76
VERBOSE: 23:20:03 [Write-Result] query_time=0.652906495, pagination.limit=400, pagination.total=1450, pagination.after=WzE2ODYxNTY5MTAwMDAsIjcwMjY5NzQ3YzlmNTQ3ZGRhMzQxZWE0YjUwYjFjYzk2X2FmZDQ3ZGI5OGFlNzNlMGI5MjExYjJkZTEzNDhiODE1Il0=, powered_by=spapi, trace_id=ea48a40f-79f4-416b-834b-82f1a588e0b1
VERBOSE: 23:20:07 [Write-Result] query_time=0.755064774, pagination.limit=400, pagination.total=1445, pagination.after=WzE2ODYxNTc0NjkwMDAsImRkODkyY2ZjMjBmOTQ1NWE4NWNiN2RiMDZmNzZkNGQ1XzQ1ZWY0M2FjOTRhZTM1NWM5Y2RhMGMwMjljZWYzYzE2Il0=, powered_by=spapi, trace_id=f6a88e83-34ba-4cf9-afe9-2aae7fe7ebe6
VERBOSE: 23:20:09 [Write-Result] query_time=0.46233931, pagination.limit=400, pagination.total=1450, pagination.after=, powered_by=spapi, trace_id=de19257c-6511-4e8e-a189-7b8a5939c166

Although I think the API is causing this endless loop, I've added a break in Invoke-Loop for the 2.2.6 release for when unexpected conditions like this appear. This line is changed from:

} while ($null -ne $Object.total -and $Int -lt $Object.total)

to:

} while ($null -ne $Object.total -and $null -ne $Next -and $Int -lt $Object.total)

Can you try modifying that line in your local copy, reloading PowerShell and seeing if you can recreate the endless loop? I assume this will stop the attempts as pagination.after is no longer available.

guy-user commented 1 year ago

Thanks, will do, and I'll get back to you if I can recreate. Looks like its becoming more and more prevalent as an issue. My ingest process is only running for an hour or so before running into a loop. So if I can't recreate it then the code will likely represent a fix (for the looping at least). (much appreciated for the quick response btw)

I'll also log a ticket through to CS support as well then given its API related.

Would this API issue potentially mean we are not receiving some of the vulnerability data? (i.e. the last 5 events in my example above). That would therefore probably be the cause why "I have also been having issues recently with incorrect vulnerability results when querying across the full vulnerability dataset..."

guy-user commented 1 year ago

Also, re "I've added a break in Invoke-Loop for the 2.2.6 release for when unexpected conditions like this appear", depending on what CS support come back with in terms of my data validity concerns, that fix might not be desirable as it would mask the issue that has cropped up on this occasion. I am sure the loop logic was spot on as it was prior, assuming the API behaves as expected. If that extra break logic was in there it would have taken a lot longer to identify the cause of this issue, and may have meant it was almost unidentifiable.

I'll post any updates I have re data validity as I get feedback on the support case.

guy-user commented 1 year ago

Case 01123641 has been logged with CS support 👍

bk-cs commented 1 year ago

Would this API issue potentially mean we are not receiving some of the vulnerability data?

I can't rule it out, but it seems like it's something related to how pagination works on the particular API rather than the results themselves missing. The data that feeds the API is the same data that feeds the UI, but they might differ in their approach.

If that extra break logic was in there it would have taken a lot longer to identify the cause of this issue, and may have meant it was almost unidentifiable.

This is a great point and not something I considered. I wonder if I can add in an additional error to call out what happened and make it more obvious...

Thanks for opening the support ticket! Please let me know how it goes. I'll keep this issue open for now, but from my point-of-view, I don't see any additional actions that need to be taken.

guy-user commented 1 year ago

Understood, thanks. I'll update here when we've made some progress on the support case.

guy-user commented 1 year ago

We have hopefullly gotten past the issue as CS Support have indicated it was likely load related.

I have removed the workaround from private.ps1 to see if the looping issue resurfaces. I will post an update and close the issue in 1 week at the most, or sooner if I see the looping re-occurs.

Here is the response from Support:

I believe that pagination.total counts is expected to potentially change during a search as follows • If a low and high boundary on updated_timestamp is added (like the cx is doing), pagination.total may only go down (for vulns updating mid search) • if a low boundary on updated_timestamp is added, pagination.total may only go up (again, for older vulns getting updated mid search)

They have confirmed that at this time there is nothing putting high pressure on the backend system in charge of storing and returning the vulnerability data and they have been unable to replicate the issue._

bk-cs commented 1 year ago

Thank you. It sounds like a glitch in the matrix... I'll consider this resolved for the time being, as it seems to have been a temporary issue that is not PSFalcon related, and there's not much I can do to resolve something like this popping up.