Extend sonarqube importer to collect "where is the issue" tab content

xoda commented 2 years ago

Extend sonarqube importer to collect "where is the issue" tab content (where the vulnerability details are).

In current sonarqube importer, only the content belonging to the tab "why is this an issue" is imported. I propose to collect the "where is the issue" section, that contains the details of the vulnerability, since the "why is this an issue" tab just contains generic (and, frankly, mostly obvious) info, nothing about the specific portions of code affected by the issue.

For clarity, screenshots attached.

What is imported in dd:

Screenshot 2022-09-09 at 17 17 12

Content in Sonarqube:

Screenshot 2022-09-09 at 17 16 22

@kiblik

kiblik commented 2 years ago

@xoda, I check the actual implementation. The part responsible for pulling data is here https://github.com/DefectDojo/django-DefectDojo/blob/dc20ee1bf9f99953b50c610bd9840053e8fd12f1/dojo/tools/sonarqube_api/api_client.py#L38-L53 Except of other parameters processed in the function SonarQubeAPI.import_issues(test), response from the API contains also values

"component": "XYZ:abc.py",
"textRange": {
    "startLine": 68,
    "endLine": 68,
...
},

We will need to pull data via SQ API endpoint api/sources/show which requires from, to and key. Values from previous API responses will be helpful here.

I found out, that there is available also the endpoint api/sources/scm which can provide information

Author of the commit
Datetime of the commit
Revision of the commit

for each line of code.

Questions (@xoda, your feedback is welcome):

I suppose that showing "wrong" lines without context is not the best. I would like to add also a couple of lines before and after. What is recommended value? ±5 lines?
Git data from scm endpoint could be useful but I'm not able to find how to put this information in markdown and keep it clean. Are we going to avoid them? Or any other recommendation?
class Finding supports many useful fields like steps_to_reproduce, files, sast_source_object, sast_sink_object or sast_source_file_path. Are we going to put lines of code to the description or to one of these fields?

xoda commented 2 years ago

Hey @kiblik,

I suppose that showing "wrong" lines without context is not the best. I would like to add also a couple of lines before and after. What is recommended value? ±5 lines?

Well, in my mind I had the idea that we should limit ourselves to import the data that is shown in sonarqube, and, thus, to show just what sonarqube is showing in that "where is the issue" tab. I wouldn't do anything else than showing what sonarqube is showing (i.e. if sonarqube is showing just one line of code in the "where is my issue", we should show just one single line of code).

I've started looking at how this is handled in SQ it with a test run.

Turns out that SQ seem to use a different (~~undocumented? I couldn't find it from their website~~) API under the hood /api/sources/issue_snippets.

It works like this: provided in input a vulnerability uuid via GET, it replies out which lines we should show, their content (bummer: it's not raw content unfortunately, it has some HTML inside, we might have to get the raw source code from another api once we know the lines numbers). The information is even enriched with the scm data you mentioned.

Example GET:

GET /api/sources/issue_snippets?issueKey=BYM3dNQlGnWW-1YgBjBM HTTP/1.1

Response:

{
  "dummy_project:src/vulnerable/a.php": {
    "component": {
      "key": "dummy_project:src/vulnerable/a.php",
      "uuid": "BYM3dMtaGnWW-1YgBjBJ",
      "path": "src/vulnerable/a.php",
      "name": "a.php",
      "longName": "src/vulnerable/a.php",
      "q": "FIL",
      "project": "dummy_project",
      "projectName": "dummy_project",
      "measures": {
        "lines": "6.0",
        "coverage": "0.0",
        "issues": "1.0"
      }
    },
    "sources": [
      {
        "line": 1,
        "code": "&lt;?php",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "duplicated": false,
        "isNew": true
      },
      {
        "line": 2,
        "code": "",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "duplicated": false,
        "isNew": true
      },
      {
        "line": 3,
        "code": "<span class=\"sym-1 sym\">$variable</span> = $_GET[<span class=\"s\">'x'</span>];",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "utLineHits": 0,
        "lineHits": 0,
        "duplicated": false,
        "isNew": true
      },
      {
        "line": 4,
        "code": "<span class=\"k\">echo</span> <span class=\"s\">\"dummy echo\"</span>;",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "utLineHits": 0,
        "lineHits": 0,
        "duplicated": false,
        "isNew": true
      },
      {
        "line": 5,
        "code": "<span class=\"k\">eval</span>(<span class=\"sym-1 sym\">$variable</span>);",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "utLineHits": 0,
        "lineHits": 0,
        "duplicated": false,
        "isNew": true
      },
      {
        "line": 6,
        "code": "",
        "scmRevision": "",
        "scmAuthor": "",
        "scmDate": "2022-09-13T15:25:20+0000",
        "duplicated": false,
        "isNew": true
      }
    ]
  }
}

Screenshot of what is shown in the GUI for clarity:

Git data from scm endpoint could be useful but I'm not able to find how to put this information in markdown and keep it clean.

Great question. I don't have a strong opinion on how to show it in markdown to make it look good. Since I don't think we can add extra info on a fenced code block in defectdojo, I guess we will be forced to add this info just before the code block somehow.

We will need to experiment a little bit on the aesthetics of this.

Are we going to avoid them? Or any other recommendation?

Well, we might think to make scm info optional. I'm not sure if it's doable in this way, but I was thinking we might end up showing it only if some value is set in the Service Keys during the setup of DefectDojo's Product Setting API.

We already have BUG,VULNERABILITY,CODE_SMELL configurable in the extra fields - why shouldn't we add something like SCM or SCM_INFO?

class Finding supports many useful fields like steps_to_reproduce, files, sast_source_object, sast_sink_object or sast_source_file_path. Are we going to put lines of code to the description or to one of these fields?

I was thinking to place everything in the Description, but I'm open to different ideas!

Thank you very much!

xoda commented 2 years ago

After looking better at SQ documentation, the API is there, but is marked as internal, stating:

Use at your own risk; internal services are subject to change or removal without notice.

It's a pity since this API is perfectly designed for our purpose. Shall we use it anyway? If we don't trust this, we could also make this whole new feature optional and in place only when a Service Key is set.

kiblik commented 2 years ago

After looking better at SQ documentation, the API is there, but is marked as internal, stating:

Use at your own risk; internal services are subject to change or removal without notice.

It's a pity since this API is perfectly designed for our purpose. Shall we use it anyway? If we don't trust this, we could also make this whole new feature optional and in place only when a Service Key is set.

I would have no problem to use "internal" API endpoints for my project. But not for DefectDojo, which is open source, and you never know, who will have enough time to fix issues. Plus, different users can use different versions of SonarQube which can mean different parameters or responses on the API, it could be a way to the hell to keep support for different versions of SonarQube. I prefer to use only official endpoints.

Well, in my mind I had the idea that we should limit ourselves to import the data that is shown in sonarqube, and, thus, to show just what sonarqube is showing in that "where is the issue" tab. I wouldn't do anything else than showing what sonarqube is showing (i.e. if sonarqube is showing just one line of code in the "where is my issue", we should show just one single line of code).

"where is the issue" contains also a button to show the rest of the code () So you are able to see the whole file without leaving tab "where is the issue" :)

But I have no problem putting only code (it can be also multiple lines) where the vulnerability is. I just opened the discussion.

It works like this: provided in input a vulnerability uuid via GET, it replies out which lines we should show, their content (bummer: it's not raw content unfortunately, it has some HTML inside, we might have to get the raw source code from another api once we know the lines numbers). The information is even enriched with the scm data you mentioned.

DefectDojo is putting field description into <pre> so HTML code will not be rendered nicely :(

Well, we might think to make scm info optional. I'm not sure if it's doable in this way, but I was thinking we might end up showing it only if some value is set in the Service Keys during the setup of DefectDojo's Product Setting API.

We already have BUG,VULNERABILITY,CODE_SMELL configurable in the extra fields - why shouldn't we add something like SCM or SCM_INFO?

Optional? No problem. I prefer extra fields.

I was thinking to place everything in the Description, but I'm open to different ideas!

Why not :)

xoda commented 2 years ago

I prefer to use only official endpoints.

Sure, that's reasonable.

I'm still not fully convinced about the idea of blindly adding +- 5 lines of code from the sink. I explored a little dd's codebase. Since we're using /api/issues/search in find_issues() method, could we take advantage of the issues.flows.locations?

Request GET /api/issues/search?issues=BYM3dNQlGnWW-1YgBjBM&types=BUG,VULNERABILITY,CODE_SMELL

Response

    "issues": [
      {
        "key": "BYM3dNQlGnWW-1YgBjBM",
        "rule": "phpsecurity:S5334",
        "severity": "BLOCKER",
        "component": "dummy_project:src/vulnerable/a.php",
        "project": "dummy_project",
        "hash": "25ef4d194ad143b5c870115505414e0a",
        "textRange": {
          "startLine": 5,
          "endLine": 5,
          "startOffset": 0,
          "endOffset": 15
        },
        "flows": [
          {
            "locations": [
              {
                "component": "dummy_project:src/vulnerable/a.php",
                "textRange": {
                  "startLine": 5,
                  "endLine": 5,
                  "startOffset": 0,
                  "endOffset": 15
                },
                "msg": "sink: tainted value is used to perform a security-sensitive operation"
              },
              {
                "component": "dummy_project:src/vulnerable/a.php",
                "textRange": {
                  "startLine": 3,
                  "endLine": 3,
                  "startOffset": 0,
                  "endOffset": 22
                },
                "msg": "tainted value is propagated"
              },
              {
                "component": "dummy_project:src/vulnerable/a.php",
                "textRange": {
                  "startLine": 3,
                  "endLine": 3,
                  "startOffset": 12,
                  "endOffset": 22
                },
                "msg": "source: this value can be controlled by the user"
              }
            ]
          }
        ],
        "resolution": "FIXED",
        "status": "CLOSED",
        "message": "Change this code to not dynamically execute code influenced by user-controlled data.",
        "effort": "30min",
        "debt": "30min",
        "author": "",
        "tags": [
          "cwe",
          "owasp-a1",
          "sans-top25-risky"
        ],
        "creationDate": "2022-09-13T15:25:20+0000",
        "updateDate": "2022-09-14T10:05:06+0000",
        "closeDate": "2022-09-14T10:05:06+0000",
        "type": "VULNERABILITY",
        "scope": "MAIN",
        "quickFixAvailable": false
      }
    ]

I was surprised how informative the issues.flows.locations seem to be here. What do you think to use this information and to import the matching source code lines from api/sources/show?

We have two options here:

Displaying the source code as several snippets of code (displaying also the messages about taint analysis and sinks)
Displaying the source code from the lowest startline-endline range to the highest one (but it could realistically happen that we will get a large amount of loc in some cases).

What do you think?

"where is the issue" contains also a button to show the rest of the code () So you are able to see the whole file without leaving tab "where is the issue" :) But I have no problem putting only code (it can be also multiple lines) where the vulnerability is. I just opened the discussion.

Well, of course, something like that button () would be the best option. But I guess we would need to implement it from scratch in the frontend, or am I missing some features already built-in?

Optional? No problem. I prefer extra fields.

Nice, so I guess scm will be optionally added above the code snippets. Again, I have very little experience about what we can do with defectdojo's frontend out of the box, so I guess I'll need to experiment a little to make it look good. Any tips?

xoda commented 2 years ago

I played with it. Maybe I'm doing something wrong, but it seems like we're very limited in the frontend with code blocks (I don't know why, but even syntax highlighting seem to not be working properly).

There are some markdown libraries that are able to render metadata (like author field), but at this point I'd say that we're really far from there and this goes out of the context of the current issue. I might work with some defectdojo frontend expert to improve that part later on.

For the moment, I'd say that we should do what we can with the limited features we have. Placing the issue info in the description is already a great improvement imho.

xoda commented 2 years ago

Let's try to keep it simple.

I'm trying to show a tooltip (e.g. text) to keep the displayed info compact, but honestly I'm not sure if it's helpful or confusing. I didn't find any other way to do so in defectdojo yet (this is just an hackish way to use a href). I very much dislike its clickability. Feedback is really appreciated on this.

Taking the case I posted above as an example, I'm thinking to produce a Description similar to the following:

## Why is this an issue

Applications that execute code dynamically should neutralize any externally-
provided values used to construct the code. Failure to do so could allow an
attacker to execute arbitrary code. This could enable a wide range of serious
attacks like accessing/modifying sensitive information or gain full system
access.

The mitigation strategy should be based on whitelisting of allowed values or
casting to safe types.

**Noncompliant Code Example**

    $data = $_GET["data"];
    eval("echo \$data;");

**Compliant Solution**

    $data = $_GET["data"];
    if (in_array($data, $whitelist)) {
      eval("echo \$data;");
    }

## Issue details

**Where**: [dummy_project:src/vulnerable/a.php](## "Component"), [3](## "Line(s)"), [12](## "startOffset")-[22](## "endOffset")
**Message**: source: this value can be controlled by the user

    $variable = $_GET['x'];

**Where**: [dummy_project:src/vulnerable/a.php](## "Component"), [3](## "Line(s)"), [0](## "startOffset")-[22](## "endOffset")
**Message**: tainted value is propagated

    $variable = $_GET['x'];

**Where**: [dummy_project:src/vulnerable/a.php](## "Component"), [5](## "Line(s)"), [0](## "startOffset")-[15](## "endOffset")
**Message**: sink: tainted value is used to perform a security-sensitive operation

    eval($variable);

### Suggested solution

Change this code to not dynamically execute code influenced by user-controlled data.

So, first section has the information from "Why is this an issue". A second section goes much more in detail of the issue, retrieving the information about the code flow (/api/issues/search) and source (api/sources/show, as you were suggesting in the very first comment). On top of this, if the SCM option is set, next to **Where** and **Message** we place the commit author, date, and revision details.

Any feedback is welcome!

manuel-sommer commented 1 year ago

Hi @xoda and @kiblik , what if we generate a link in each DefectDojo finding to the original Sonarqube issue? The link could be generated through the tool configuration URL + project + issue id.

kiblik commented 1 year ago

Hi @xoda and @kiblik , what if we generate a link in each DefectDojo finding to the original Sonarqube issue? The link could be generated through the tool configuration URL + project + issue id.

Do not fully solve issue but sounds like acceptable help.

manuel-sommer commented 1 year ago

I guess, this issue can be closed now @xoda

kiblik commented 1 year ago

I guess, this issue can be closed now @xoda

I suppose that #8163 helped (the developer can easily jump to the SonarQube issue) but fetching a snippet of vulnerable code and putting it into the description of Dojo finding can increase user experience. So there still place for improvement.

DefectDojo / django-DefectDojo

Extend sonarqube importer to collect "where is the issue" tab content #6834