Closed aeisenberg closed 2 years ago
Alternative streaming library is https://www.npmjs.com/package/JSONStream.
After a quick look, JSONStream seems to be better at grabbing pieces of the JSON through a regex-like expression. And stream-json uses SAX-like events and may be better at creating the entire SARIF as a javascript object. To handle large sarif, we will need to read in the entire file, so JSONStream may be more appropriate.
Ensure we have tests for both regular and large SARIF files
Note that creating a test for a large sarif file will require either generating one on the fly, or downloading it from a known location. We cannot check in a 4GB file into the repository. It might be nicer to generate the file, rather than download it to save bandwidth for local development.
Also, note that 4GB is not a hard limit, the size limit is probably more related to the memory available in the local environment.
Agree with generating it on the fly. A SARIF file with excessively long path explanations should do the trick.
I think a better way to test this might be to generate a relatively small SARIF file but somehow constrain the memory available to the test so that we would crash if we weren't streaming it.
The repo for JSONStream is archived. Should we still use it?
Hmmm...I hadn't seen that. Thanks for pointing it out. We don't want to use an archived project. There are other alternatives.
How about this library? https://www.npmjs.com/package/stream-json I haven't explored it too much, but on a superficial level at least, it looks promising.
Perhaps it was only recently archived?
I'll take a look at this one. Thanks!
Hi,
I'm not sure how to handle that: https://github.com/apache/ofbiz-framework/actions/runs/1402011029. I mean how to reduce the file.
I have also a problem to generate a SARIF file using Spotbugs on command line. So I'm somehow stuck :/
Any other ideas how to generate a SARIF file?
TIA
@JacquesLeRoux I think you are facing a different problem -- this repo and issue involve the CodeQL extension for VS Code, but I believe your issue is with uploading CodeQL results to GitHub's code scanning service.
So that we can help you debug why the SARIF file produced by your CodeQL workflow is so large, please:
- uses: github/codeql-action@v1
with:
debug: true
For non-CodeQL tools, see https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github for information on uploading a SARIF file to GitHub.
Hi @adityasharad ,
Got this: https://github.com/apache/ofbiz-framework/actions/runs/1466549612
What to you suggest? TIA
Hi @JacquesLeRoux,
It looks like you are not referencing the correct path to the action. It should be:
uses: github/codeql-action/analyze@v1
uses: github/codeql-action@v1
The uses
directive accepts an action specification in the following syntax:
org/repo/path@ref
Where org
is a github organization or username, repo
is the name of the repo to get the action from. path
is the file path in that repository where you can find the folder containing the action.yml
file. And ref
is a git ref or sha to check out to get that action.
Hi @aeisenberg ,
Yes I followed Aditya's comment above. I must say I'm quite new to CodeQL and even YAML.
Sorry but it did not work either: https://github.com/apache/ofbiz-framework/actions/runs/1467782229
What could it miss? TIA
Ah...apologies for that. the debug
option should be added to the init
action. Right here: https://github.com/apache/ofbiz-framework/blob/72f86558575a54e5c16748756f7e0a1b7cd82da8/.github/workflows/codeql-analysis.yml#L67 add debug: true
and remove it from the analyze
step.
This will not fix the problem you are seeing, but it will retain information about your run that will help us triage the problem.
As @adityasharad suggests, after you do this, please create an issue in https://github.com/github/codeql-action with a link to the workflow run.
Thanks Andrew,
Checking that...
In case it would help someone else, here is the workflow run with the (2,77 GB!) log inside: https://github.com/apache/ofbiz-framework/actions/runs/1470420767
When trying to open the interpreted results of a query run that has produced a sarif results file of >4GB, we get an error like this:
Node limits the size of strings and buffers to 4294967295 bytes, even on machines that have enough ram to support more.
The parsed version of the sarif results could fit in memory, even if the string cannot. It's possible that a streaming JSON parser, like JSONStream could work, but I need to explore this library in more detail and make sure it is safe and stable before we can use.
I don't think it is a good idea to roll our own streaming parser if there is a suitable OSS one available since there would be a fair amount of work involved and getting the edge cases to work is tricky.
Suggested breakdown:
JSONSchema
as a dependencyJSONSchema
when reading the SARIF file produced by results interpretationRangeError