Open Ronoman opened 2 years ago
It seems like I made a poor assumption with some of the output, but I'm not sure how to resolve it. Most clang_tidy
errors look like this:
/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:153:14: warning: Call to function 'vsnprintf' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'vsnprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
int size = vsnprintf(char_array->buffer, char_array->buffer_capacity, format, args_clone);
^
These are matched and groups are extracted properly with the regex as it stands. However, some output looks like this:
/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:123:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
Here, the rule ID is missing from the result. In most cases, these are preceded by the same exact error, like this block:
/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
memcpy(dst_ptr, src_ptr, array_list->impl->data_size * copy_count);
^
/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to array_list.c:175:5
). However, this isn't always the case. It seems like clang_tidy
does not add the rule ID when it is reporting note:
results, as opposed to warning:
or error:
results. How can we capture this different clang_tidy
output properly in SARIF?
In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to
array_list.c:175:5
). However, this isn't always the case. It seems likeclang_tidy
does not add the rule ID when it is reportingnote:
results, as opposed towarning:
orerror:
results. How can we capture this differentclang_tidy
output properly in SARIF?
It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?
It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?
According to The Spec, it looks like we can omit result.ruleId
in some cases. The relevant sentence (I think) is this:
If theDescriptor does not exist (that is, if theTool does not contain a reportingDescriptor object (§3.49) that describes the rule that was violated), then rule SHALL NOT be present.
theTool
probably does have the relevant rule ID, but since the result
for note
s doesn't give it to us, I think we're okay to omit it. In most of the cases I've seen with clang_tidy, the result description field is usually sufficient to understand what the violation is, and what kind of action needs to be taken to resolve it.
Resolved all outstanding comments in 5d5e2eda. However, a new slight issue arises...
I've updated the regex rules to capture all clang_tidy results, even one that don't end in a [rule description]
. However, this means that process_sarif
is no longer going to de-duplicate results in clang_tidy properly. On line 106 of sarif_helpers, we check if the tuple (threeple?) (ruleId, artifact, region)
has been seen before. With these changes, there are now results in clang_tidy that are exact duplicates, but some are missing the ruleId, so this check is no longer sufficient.
Is it okay to reduce the tuple to (artifact, region)
? This would mean that any results that are pointing to the same file, with the exact same region (line number, sometimes column number) will be considered identical. I haven't yet observed two unique results pointing to the exact same artifact and region, so this seems like a reasonable assumption, but not always guaranteed to be true.
Perhaps it would be more accurate to keep the (ruleId, artifact, region) and only make the reduction to (artifact, region) if necessary. In other words, ruleId would be optional in the comparison function, but used if present in both items to be compared.
I'm following up on this (we have a PR in Space ROS that depends on this).
Does it still make sense to do this? If so, what's missing?
clang-tidy can be configured to output a YAML-formatted "fixes file" which is probably the way I'd recommend to ingest tidy output programmatically.
rules
field in the SARIF output was polluted with many duplicate rules, as the full error message was included. Now, only the rule id thatclang-tidy
outputs between brackets (i.e.[google-explicit-constructor]
) is taken and stored in rules.startLine
andstartColumn
fields were strings. These are now integers.This was tested on
rclcpp
andrcutils
.ament_clang_tidy
ran successfully on both, and outputted valid SARIF. However, I did have to change some of the RegEx's that were parsingclang_tidy
output. I don't have a great way to check if I'm truly capturing all of the valid output ofclang_tidy
, so this may be unintentionally hiding some violations.