github / codeql-action

Actions for running CodeQL analysis
MIT License
1.09k stars 306 forks source link

๐Ÿ› "SARIF URI scheme \"https\" did not match the checkout URI scheme \"file\",", #754

Open laurentsimon opened 2 years ago

laurentsimon commented 2 years ago

I'm developing a GitHub action following https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#physicallocation-object

The doc says the artifactLocation.uri: If the URI is absolute, code scanning can use the URI to checkout the artifact and match up files in the repository. For example, https://github.com/ghost/example/blob/00/src/promiseUtils.js

When I use a URL though, the analysis keeps failing Analysis processing failed.

I ran curl -u laurentimon:$GITHUB_AUTH_TOKEN -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/laurentsimon/scorecard-action-test/code-scanning/analyses and got the error message below:

"error": "SARIF URI scheme \"https\" did not match the checkout URI scheme \"file\",",

which seems to indicate https is not supported... but the doc says it is.

So my first question is: are URLs supported? I'm using them in SARIF's relatedLocation.physicalLocation.URI. If they are, what am I doing wrong?

My second question: I'd like to report related locations with urls, e.g. https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig. Urls may have different content types, so it's not entirely clear whether I'd need to use StartLine, CharOffset or ByteOffset depending on the content type. Ideally, ByteOffset should work regardless of content type.

Can you advise?

Thanks!

aeisenberg commented 2 years ago

Can you attach the relevant portion of you sarif file? Also, if your repository is public, can you share a link?

laurentsimon commented 2 years ago

My repo is public https://github.com/laurentsimon/scorecard-action-test File: see the link https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh I used as example.

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "automationDetails": {
        "id": "supply-chain/scorecard/67a8e5f9f6015aace89817878733b539c0f117de-28 Sep 21 18:55 +0000"
      },
      "tool": {
        "driver": {
          "name": "Scorecard",
          "informationUri": "https://github.com/ossf/scorecard",
          "semanticVersion": "2.2.8-4-g67a8e5f-dirty",
          "rules": [
            {
              "id": "b451a10b1a462197cf89aab8af8a0140",
              "name": "Signed-Releases",
              "helpUri": "https://github.com/ossf/scorecard/blob/67a8e5f9f6015aace89817878733b539c0f117de/docs/checks.md#signed-releases",
              "shortDescription": {
                "text": "Determines if the project cryptographically signs release artifacts."
              },
              "fullDescription": {
                "text": "This check tries to determine if the project cryptographically signs release artifacts.<br>Signed releases attest to the provenance of the artifact. A low score is considered 'High' risk.<br>It works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub."
              },
              "help": {
                "text": "This check tries to determine if the project cryptographically signs release artifacts.\nSigned releases attest to the provenance of the artifact. A low score is considered 'High' risk.\nIt works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub.",
                "markdown": "This check tries to determine if the project cryptographically signs release artifacts.  Signed releases attest to the provenance of the artifact. A low score is considered 'High' risk.  It works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub."
              },
              "defaultConfiguration": {
                "level": "error"
              },
              "properties": {
                "precision": "high",
                "tags": [
                  "supply-chain",
                  "security",
                  "releases"
                ]
              }
            }
          ]
        }
      },
      "results": [
        {
          "ruleId": "b451a10b1a462197cf89aab8af8a0140",
          "level": "note",
          "ruleIndex": 0,
          "message": {
            "text": "5 out of 5 artifacts are signed -- score normalized to 10"
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": ".github/scorecard.yml",
                  "uriBaseId": "%SRCROOT%"
                }
              }
            }
          ],
          "relatedLocations": [
            {
              "id": 0,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.2.8_checksums.txt.sig"
              }
            },
            {
              "id": 1,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.2.3_checksums.txt.sig"
              }
            },
            {
              "id": 2,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.3_checksums.txt.sig"
              }
            },
            {
              "id": 3,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.2_checksums.txt.sig"
              }
            },
            {
              "id": 4,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.1_checksums.txt.sig"
              }
            }
          ]
        }
      ]
    }
  ]
}
aeisenberg commented 2 years ago

The sarif file looks fine to me. I'm able to upload and download it from code scanning. Can you point me to a failing actions run? Presumably, you have an instance where upload-sarif is not working?

aeisenberg commented 2 years ago

And code-scanning currently only supports character offset/length and start/end line/columns offsets.

laurentsimon commented 2 years ago

re: support for only character offset.. What if a binary file is reported in the results? Shall I set the line/character offset or that won't work?

re: upload-sarif This is the workflow used to test https://github.com/laurentsimon/scorecard-action-test/blob/main/.github/workflows/scorecard-analysis.yml

It works well except when it encounters a URL.

aeisenberg commented 2 years ago

I've forwarded your question on to someone on the team who has more experience with how code scanning processes sarif files.

laurentsimon commented 2 years ago

Thanks so much!

laurentsimon commented 2 years ago

And code-scanning currently only supports character offset/length and start/end line/columns offsets.

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Looking forward to your guidance nevertheless.

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

rneatherway commented 2 years ago

:wave:

Code Scanning assumes that the locations are for files in the repository. As a result it is trying to relativize your URLs against a file:// URL that is the directory on which the repository is checked out on the Actions runner.

Using a relative URL, as you have done in the locations, is the easiest way to work with Code Scanning. Would this work for the relatedLocations as well?

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

Are you talking about the relatedLocations? Can you help me to understand what the relatedLocations represent in your output file? I'll then be able to make some recommendations or take feedback to the product team.

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Right now we just support Start/EndLine and Start/EndColumn because Code Scanning is designed to work with source code. The result view highlights a particular line range in a text file and we haven't thought about how we would display or highlight a range in a binary file. For now I think your best bet is to leave out the line/column information.

laurentsimon commented 2 years ago

๐Ÿ‘‹

Code Scanning assumes that the locations are for files in the repository. As a result it is trying to relativize your URLs against a file:// URL that is the directory on which the repository is checked out on the Actions runner.

Using a relative URL, as you have done in the locations, is the easiest way to work with Code Scanning. Would this work for the relatedLocations as well?

not really. Our tool outputs information about assets and release, so links like https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig are the one we wanted to insert for related locations. It would be good enough for a clickable link to show. Those links are useful for users to download and inspect if they wish. My suggestion would be that for links that do not correspond to a file in the repo, you can simply show a clickable link.

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

Are you talking about the relatedLocations? Can you help me to understand what the relatedLocations represent in your output file? I'll then be able to make some recommendations or take feedback to the product team.

No, I was really talking about locations: they are an array in the standard. These tutorials https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#locations says is an array because sometimes you have to make changes in more than one place to fix a problem

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Right now we just support Start/EndLine and Start/EndColumn because Code Scanning is designed to work with source code. The result view highlights a particular line range in a text file and we haven't thought about how we would display or highlight a range in a binary file. For now I think your best bet is to leave out the line/column information.

rneatherway commented 2 years ago

not really. Our tool outputs information about assets and release, so links like https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig are the one we wanted to insert for related locations. It would be good enough for a clickable link to show. Those links are useful for users to download and inspect if they wish. My suggestion would be that for links that do not correspond to a file in the repo, you can simply show a clickable link.

Thanks, that makes sense. I'll open an issue to consider how we handle URLs internally.

In SARIF, relatedLocations are normally used by including a markdown-style link of the form [link text](N) where N is the index of the relatedLocation you wish to reference. You can see an example here: https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Ref493499246

Because of the problem with accepting https URLs you won't currently be able to do this, but you could short-cut the process and directly insert a markdown link into the result message.

No, I was really talking about locations: they are an array in the standard. These tutorials https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#locations says is an array because sometimes you have to make changes in more than one place to fix a problem

Yes, this is a current limitation of our SARIF support, which would be hard to change because of the way the alert view works. Our current suggestion would be either to create multiple results or to have a single result and use relatedLocations to reference the other places.

rneatherway commented 2 years ago

Please do keep us up to date with your development! We would be happy to review and offer further suggestions and guidance.

laurentsimon commented 2 years ago

Thank you so much for the prompt help. Kudos to your team! I'll make sure to create new issues when I encounter problems. Thanks!

rneatherway commented 2 years ago

You're welcome! :grin:

I'll close this out now, but as I said, very happy to discuss other points in the future. I'll also aim to update you here if we change the behaviour of our absolute URL handling.

elgohr commented 9 months ago

@rneatherway any updates on this topic? Having the same issue with http. In blackbox scanning, like a ZAP scan, a file location isn't useful. Any suggestions for workarounds?

aeisenberg commented 9 months ago

When this feature was originally suggested, we did not have enough resources to implement it. I am asking again to see if it is something worth implementing now.

aeisenberg commented 9 months ago

Unfortunately, we do not have any capacity to work on this right now. It's on our backlog and we may decide to pick it up in the future.

elgohr commented 9 months ago

Could we keep this issue open until implemented?

edwardgalligan commented 1 month ago

Also running into this issue - I'm wondering if it's possible to ignore and continue for https URIs in the short-term until the team settles on an appropriate way to handle them.

e.g. if I develop a workflow that has relative file:// uris for some artifacts & https:// uris for others, to produce a valid SARIF file (per spec) it would be great if the API would accept my valid SARIF file & just handle the file:// uris within it as normal.

At present, we're looking at pre-processing the SARIF to exclude the https:// artifacts (leaving us with an incomplete SARIF), just to satisfy Github's validation - ideally pre-processing wouldn't be required on our side.

NOTE: Another avenue we explored was omitting location selectively for these artifacts, but that yielded the following:

locationFromSarifResult: expected at least one location

So this leaves no option but to completely omit results from the SARIF for which the artifact can't be resolved locally by Github's API. That's not really tenable.