Closed mjambon closed 1 month ago
Hi, @mjambon,
As for help, my current efforts have been getting the package publishing working. Generally there now for NPM and I need to get Crates.io added in.
The biggest things that I haven't done is download all the largest Apex repositories on GitHub and ensure I can parse them without error, and just adding more test cases and covering any gaps in parsing.
Would love to see this added to Semgrep and be able to start using it for our code base. Let me know how that goes and I'll be happy to prioritize any needed improvements.
CST -> AST is on my long-term list but haven't started any of it.
We got 99.94% parsing success on 901986 lines of code, which is excellent. The parsing success is the fraction of lines of code not affected by a parsing error (an ERROR
node). The list of repos was pulled automatically using a script. I haven't checked whether those are all relevant. It could be that some of them only contain a few Apex files. Some could also contain invalid or tricky syntax on purpose, as it happens when we scan the test corpus of compilers. The list in question is:
https://github.com/apex-enterprise-patterns/fflib-apex-common
https://github.com/kevinohara80/sfdc-trigger-framework
https://github.com/trailheadapps/apex-recipes
https://github.com/SFDO-Community/declarative-lookup-rollup-summaries
https://github.com/SalesforceFoundation/NPSP
https://github.com/financialforcedev/apex-mdapi
https://github.com/alexed1/LightningFlowComponents
https://github.com/apex-enterprise-patterns/fflib-apex-mocks
https://github.com/mitchspano/apex-trigger-actions-framework
https://github.com/jongpie/NebulaLogger
https://github.com/trailheadapps/automation-components
https://github.com/ipavlic/apex-fp
https://github.com/mbotos/SmartFactory-for-Force.com
https://github.com/sfdx-mass-action-scheduler/sfdx-mass-action-scheduler
https://github.com/dhoechst/Salesforce-Test-Factory
https://github.com/SalesforceFoundation/EDA
https://github.com/apex-enterprise-patterns/force-di
https://github.com/SalesforceLabs/Milestones-PM
https://github.com/trailheadapps/dreamhouse-sfdx
https://github.com/benahm/TestDataFactory
https://github.com/developerforce/trailhead-code-samples
https://github.com/apex-enterprise-patterns/fflib-apex-common-samplecode
https://github.com/forcedotcom/CustomMetadataLoader
https://github.com/abhinavguptas/Salesforce-Lookup-Rollup-Summaries
https://github.com/rsoesemann/apex-unified-logging
https://github.com/j-fischer/rflib
https://github.com/pdalcol/Zippex
https://github.com/choudharymanish8585/Apex-Development-Course
https://github.com/rsoesemann/visualforce-table-grid
https://github.com/jamessimone/apex-rollup
As for using it in Semgrep when it's ready: it's likely that Apex support in Semgrep will remain proprietary so we'll have to arrange a way for you to use it. Don't hesitate to reach out to martin@r2c.dev.
I have an error log and also a list of errors aggregated by type and sorted by frequency. I'll look into this in the coming days.
Thanks for doing that. I have a few issues I've found this week that I'll have a fix for today or tomorrow.
Hopefully that helps clean up some errors.
If you have the ability to share the list I'm happy to comb through it too, though maybe just as easy for me to run your script and build my own list. Thanks again for being willing to take a look. Don't feel like you have to figure out access if you keep it proprietary.
Here's a tarball containing the files that our scripts produce. It's obtained by running make stat
in the /lang/apex
folder of https://github.com/returntocorp/ocaml-tree-sitter-semgrep (after setting everything up, which could take a while): stat.tar.gz
This uses the original apex
grammar without semgrep extensions (for now). Note that the results are obtained by parsing using an OCaml program after some automatic transformations of the grammar, so there's always a possibility that a parsing error reported here is not observed when using a simple tree-sitter test
call (i.e. could be a bug in ocaml-tree-sitter rather than in tree-sitter-sfapex).
The most obvious files are:
The CSV files are supposed to be useful because they're an attempt to identify similar errors and sorted them from most frequent to less frequent. Unfortunately, I haven't managed to display them properly in LibreOffice, which appears to assume different conventions for the CSV format (we have code snippets in there, which include many special characters which aren't escaped properly). Anyway, this could be useful if we managed to load them properly in a spreadsheet program.
The list of repos that were scanned is what I posted earlier.
This is awesome!
A few I had fixes for and a few known issues, some of them I was able to resolve. I was able to find several in there demonstrating some clear bugs that I was able to fix and add test cases for.
I have one scenario I'm unable to get fixed right now but it is not very common.
public class Test {
{
List<SObject> objs = [SELECT Format FROM Report]
}
}
An object field name that matches a function name is a problem, it is confusing the parser and I can't get it to behave correctly yet. I'll keep working on it.
v0.0.8 is ready to go with these fixes. I've also updated the playground, https://aheber.github.io/tree-sitter-sfapex/playground/
There are several in there that are false positives. Many of the scripts
directories or other tooling directories are actually Anonymous Apex and not pure Apex, I haven't built parsing support for those yet. I normally expect those to have .apex
file type but they have .cls
file types.
The other common cause of failures is files that have replacement tags %%%NAMESPACE%%%
and have some other script that will process the file to make it valid before being uploaded.
As a percentage of the failing lines I certainly didn't fix them all, or even a majority. But I think I covered the relevant ones with the exception of the one mentioned above.
Hello,
I'm just starting the work to add Apex support to Semgrep (GitHub) due to customer demand. It's a pleasant surprise that you already did a lot of work on the tree-sitter grammar. Thank you! Do you need help? In what areas do you think we can best contribute?
Our work consists in making sure that the tree-sitter grammar for Apex has a good parsing rate and improving it as needed, and then writing boilerplate code to translate the CST to Semgrep's generic (multilanguage) AST, as well as making extensions of the grammar to support special patterns such as
...
and what we call metavariables ($FOO
).