forcedotcom / sfdx-scanner

MIT License
212 stars 49 forks source link

[BUG] CodeAnalyzerDFA is failing with timeout error: InternalExecutionError, Path evaluation timed out after 18000000 #1186

Open acc-akar opened 1 year ago

acc-akar commented 1 year ago

Describe the bug Getting error while scanning DFA: Graph Engine reached the path expansion upper limit (8360). The analysis preemptively stopped running on this path to prevent an OutOfMemory error. Rerun Graph Engine and target this entry method with a larger heap space.

To Reproduce Run command: sfdx scanner:run:dfa --sfgejvmargs "-Xmx20g" --format=csv --outfile=CodeAnalyzerDFA.csv --target="./" --projectdir="./" --category="Security"

**Current behavior Getting InternalExecutionError with above error.

Expected behavior Expecting scanning result to pass with 0 violation, but the scan terminates.

Additional context Attaching log file for review. sfge.log

"Urgency": Impacting business, need to submit package for security review.

jfeingold35 commented 12 months ago

@abhishekkarns , please see our documentation on this matter.

acc-akar commented 12 months ago

We tried everything suggested:

Running classes indivually then only few classes are failing that is either batch class or class having lots of logic and using helper classes. Helper classess scaning without any issue but when running Parent class getting InternalExecutionError.

We are doing callout in helper class, is that causing any issue? Do we need to have any specific way to handle callouts or future methods for scanner?

Please have a look and advise!

jfeingold35 commented 12 months ago

What specific InternalExecutionError are you still getting when you run with just the parent class?

acc-akar commented 12 months ago

Error: "4","3","D:\Netsutra\Vs Project\MUSTER\Muster Advocacy App\force-app\main\default\classes\MusterContactBatch.cls","100","17","MusterContactBatch","execute","","","","InternalExecutionError","Path evaluation timed out after 1800000 ms","https://forcedotcom.github.io/sfdx-scanner/en/v3.x/salesforce-graph-engine/rules/#ApexFlsViolationRule","InternalExecutionError","sfge"

I believe it must be something related to callout and future call that is happening in helper class but helper class is scanner without any issue indivisually?

jfeingold35 commented 12 months ago

Have you tried using a greater timeout than 30 minutes? It sounds like your code is pretty complicated, so it's possible that you might just need to give it more time.

acc-akar commented 12 months ago

Increased timeout to 2.5 hours but still getting same error: "4","3","D:\Netsutra\Vs Project\MUSTER\Muster Advocacy App\force-app\main\default\classes\MusterSettingsController.cls","20","31","MusterContactBatch","getMusterData","","","","InternalExecutionError","Path evaluation timed out after 9000000 ms","https://forcedotcom.github.io/sfdx-scanner/en/v3.x/salesforce-graph-engine/rules/#ApexFlsViolationRule","InternalExecutionError","sfge"

Do we need to handle callout and future call in some specific manner for DFA scanner?

jfeingold35 commented 12 months ago

My recollection is that you shouldn't have to do any kind of special handling for those. If you send me a sample of what the callouts/future calls being invoked are and how they're being invoked, I can do some testing. But my suspicion is that you might just need to increase the timeout.

acc-akar commented 12 months ago

@jfeingold35 - Per suggestion, we tried running the scan with increased timeout. The scan even failed when timeout set as 5 hours, also tried a couple of times but it did not went through and still getting same error. Not sure if the scanner is getting stuck somewhere.

Do you need the complete code to execute or you want to see the code and run scanner to debug it? Since the code has dependencies on custom objects, etc. it may not be possible to share executable code.

However, I (and my team) can walk you through the code in a working session if we could connect over a meeting. Let me know if any of these following time slot works for you, and you may join using below meeting link:

Alternatively, you may suggest a convenient time for us to connect.

Join Zoom Meeting https://agilecloudconsulting.zoom.us/j/84219299710?pwd=VytLVEpNUXBHcEFTNEFOYjAxbEo2Zz09

Meeting ID: 842 1929 9710 Passcode: 531610

Thanks in advance!

sfge (2).log

acc-akar commented 12 months ago

@jfeingold35 - We have shared an invite of the GIT code repository with you to further review. And, let me know if you need access to the sandbox org to test the code.

Scan failing for these classes: MusterSettingsController MusterContactBatch MusterContactAccountPost_Batch MusterCheckBatchStatus_Batch MusterActionBatch

acc-akar commented 11 months ago

@jfeingold35 - Hope you are doing well! Let me know if you got a chance to review the issue and have any updates/suggestions.

jfeingold35 commented 11 months ago

@abhishekkarns , sorry, yesterday was Yom Kippur, so I spent all day in shul. If I have a chance to look at this, I'll certainly do so. That said, our team's bandwidth is very limited right now, so I truly can't make any promises. In the meantime, please try running without --pathexplimit -1. This will help us determine whether it's spending 5 hours expanding a ludicrous number of paths or if it's simply getting stuck somewhere.

acc-akar commented 11 months ago

@jfeingold35 Removed --pathexplimit -1 from command and ran for timeout 18000000, now getting OutOfMemory error, asking to run with a larger heap space, see error below:

"4","3","D:\Muster\Muster Advocacy App\force-app\main\default\classes\MusterSettingsController.cls","20","31","MusterSettingsController","getMusterData","","","","LimitReached","Graph Engine reached the path expansion upper limit (7431). The analysis preemptively stopped running on this path to prevent an OutOfMemory error. Rerun Graph Engine and target this entry method with a larger heap space.","https://forcedotcom.github.io/sfdx-scanner/en/v3.x/salesforce-graph-engine/working-with-sfge/#understand-outofmemory-errors","InternalExecutionError","sfge"

So without --pathexplimit -1 getting OutOfMemory error and with --pathexplimit -1 getting timed out error even for 5 hours.

Please advise.

jfeingold35 commented 11 months ago

@abhishekkarns , okay. That does seem to support the notion that it's just a huge number of paths being expanded, and that sort of thing takes a while. You might just need to keep increasing the timeout until it passes. I'm confident that an HTTP callout wouldn't be breaking the scanner, but if you're concerned about the future methods, you can always add the following snippet immediately before the invocation:

String s = nulll;
System.debug(s.length());

This will force a null pointer exception, which halts further expansion of the path in question. If adding that before your future-call resolves this, that at least confirms that it's related.

acc-akar commented 11 months ago

@jfeingold35 - We have already tried running the scan with 5 hours as timeout, and per suggestion, we added the code snippet before the future invocation but it still failed with below timeout error:

"4","3","D:\Muster\Muster Advocacy App\force-app\main\default\classes\MusterSettingsController.cls","695","26","MusterSettingsController","syncBatchBasedOnDirection","","","","InternalExecutionError","Path evaluation timed out after 18000000 ms","https://forcedotcom.github.io/sfdx-scanner/en/v3.x/salesforce-graph-engine/rules/#ApexFlsViolationRule","InternalExecutionError","sfge"

I truly appreciate your time in helping resolving the issue. We are in a bit of tough situation. Since last month a clean report from this tool is a requirement to submit for security review. We are getting clean report for "CheckMarx" and "CodeAnalyzerGeneral", but unable to get the clean "CodeAnalyzerDFA" and this is holding up the security review submission. What else we can do or provide you to help expediate resolve the issue?

Please suggest.

jfeingold35 commented 11 months ago

@abhishekkarns , I'll start with the good news: a timeout error like what you're seeing is not a blocker for security review submission. You can always document it as a false positive and submit.

The bad news is that if injecting the null pointer exception before the future-call didn't resolve the timeout, then it means the future call isn't what's causing the timeout. It really just sounds like the code is complex, and it's taking a while to scan. And in that case, there's really only a few things you can do.

  1. You can try increasing the timeout until the run succeeds. If that means increasing it to something longer than 5 hours, then you wouldn't be the first.
  2. You can try refactoring your code to make it easier for the scanner to evaluate. Per the documentation, Graph Engine identifies path entry points, and expands paths flowing out of those points. Every time path expansion encounters a control flow branch (e.g., an if-else, try-catch, a loop, etc), it forks a new path. This is where most of the runtime comes from. Therefore, the most valuable kinds of refactor are: a. Refactors that strictly decrease the total number of paths in the codebase. Removing an unnecessary if early in a method removes not only that one path, but all the paths that fork off from that path. So small refactors can have big impacts. b. Refactors that more evenly distribute paths between entrypoints. Entrypoints are evaluated in parallel, and the timeout applies to each entrypoint individually (e.g., a 5-hour timeout means each entrypoint has 5 hours to complete). So two entrypoints with 50 paths each will evaluate faster than a single entrypoint with 100 paths, as illustrated below.

    /**
     * This entrypoint evaluates slowly because `b`'s value is unknown, meaning there are two branches.
     */
    global void combinedEntrypoint(boolean b) {
        if (b) {
            // ...
        } else {
            // ...
        }
    }
    
    /**
     * This entrypoint evaluates quick because it passes a literal into the helper method and therefore avoids branching.
     */
    global void splitEntrypointA() {
        // Literal value passed into helper method.
        helperMethod(true);
    }
    
    /**
     * This entrypoint evaluates quick because it passes a literal into the helper method and therefore avoids branching.
     */
    global void splitEntrypointB() {
        // Literal value passed into helper method.
        helperMethod(false);
    }
    
    /**
     * This method is identical to `combinedEntrypoint`.
     */
    public void helperMethod(boolean b) {
        if (b) {
            // ...
        } else {
            // ...
        }
    }
acc-akar commented 9 months ago

We are back :)

We refactored our code to reduce number of IF-ELSE and move them at lower levels and tried to identify what piece of code is causing the timeout error, but no success. What works independently, does not work when called by another class! Attached is simplified version of code to try to help identify root cause of the timeout problem.

Running command for path limit -1 and timeout 18000000, still not working and getting timeout error: InternalExecutionError, Path evaluation timed out after 18000000 Command: sfdx scanner:run:dfa --sfgejvmargs "-Xmx20g" --target './force-app/main/default/classes/MusterSettingsController.cls' --projectdir="./" --category="Security" --format csv --pathexplimit -1 --rule-thread-timeout 18000000

In sfge log file seems it stuck at below lines of MusterCustomHandler class. resStatus =MainList[0].resStatusCode; objList = MainList[0].customWpr;

From the log, looks like scanner is iterating on something and getting into infinite loop !! Scanner not actually running the code, right? Can you please help?

Debug Log.txt Sample Code.txt

acc-akar commented 9 months ago

Hi,

Today we did bit of more digging, looks like getting value from 0th index of a list is causing the problem -

if(MainList != Null && MainList.size() > 0) { resStatus =MainList[0].resStatusCode; resMessage = MainList[0].resMessage;

Instead of getting value from list's 0th index, changed the code to return object and set values as below:

if(MainList != Null) { resStatus = MainList.resStatusCode; resMessage = MainList.resMessage; objList = MainList.customWpr; }

Scanning the class took around 5 hours but succeeded with no violations found!

Does this makes sense? What is your opinion on this? We have simplified the code a lot, to find the root cause of the error but not sure if we found it !

jfeingold35 commented 9 months ago

@acc-akar , I'm able to reproduce this with the code you provided, but I'm still working on determining a precise cause. In the meantime, if you've already discovered a refactor you can do that unblock the analysis, I would strongly encourage you to continue using that workaround for now.

git2gus[bot] commented 7 months ago

This issue has been linked to a new work item: W-15080470

rmohan20 commented 6 months ago

Duplicate of #1294