Open gabibguti opened 1 year ago
Hi @gabibguti. Thanks for the information and for doing the research on SAST options for Fortran. I believe we don't use any SAST currently.
@weslleyspereira I looked at i-CodeCNES and it seems it suggests a lot of changes which are probably a bit unnecessary, e.g. avoiding multiple RETURN statements in routines (see output below). I'll let it run on the entire codebase (it's pretty slow) and collect all types of warnings it produces. Then we can discuss which of those if any really matter.
On the topic of code analysis, do we use Gfortran's sanitizers somewhere (ASAN, UBSAN, MSAN, TSAN)? If not, we (NAG) have some CMake modules which I could add to LAPACK's CMake system and enable those checks in the pipelines.
icode
output on caxpy.f
:
<?xml version="1.0" encoding="UTF-8"?>
<analysisProject analysisProjectName="undefined" analysisProjectVersion="undefined">
<analysisInformations analysisConfigurationId="undefined" analysisDate="2023-09-29" author="i-Code CNES Analyzer" icodeVersion="4.1.2" />
<analysisFile language="fr.cnes.icode.fortran77" fileName="lapack-3.11/BLAS/SRC/caxpy.f" />
<analysisRule analysisRuleId="COM.DATA.FloatCompare">
<result resultId="1" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>It's not allowed to compare float variables (SCABS1) with equality.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.DATA.Invariant">
<result resultId="2" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="107" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>The variable SCABS1 must be defined as constant.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.FLOW.Exit">
<result resultId="3" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>There is more than one exit in the function.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.FLOW.Exit">
<result resultId="4" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="135" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>There is more than one exit in the function.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="5" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="112" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="6" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="117" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="7" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="126" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="8" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="126" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="9" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="127" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="10" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="127" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="COM.INST.Brace">
<result resultId="11" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="129" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>Parentheses are needed for readability.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="F77.INST.Return">
<result resultId="12" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="110" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="F77.INST.Return">
<result resultId="13" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="F77.INST.Return">
<result resultId="14" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="135" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
<resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
</result>
</analysisRule>
<analysisRule analysisRuleId="F77.MET.ComplexitySimplified">
<result resultId="15" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="8.0" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.ComplexitySimplified">
<result resultId="16" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.Nesting">
<result resultId="17" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="2.0" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.Nesting">
<result resultId="18" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.LineOfCode">
<result resultId="19" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="26.0" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.LineOfCode">
<result resultId="20" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="26.0" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.RatioComment">
<result resultId="21" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="52.941177" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.RatioComment">
<result resultId="22" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.LineOfComment">
<result resultId="23" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="27.0" />
</analysisRule>
<analysisRule analysisRuleId="F77.MET.LineOfComment">
<result resultId="24" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="113.0" />
</analysisRule>
</analysisProject>
This is all great @ACSimon33. Thanks a lot for the help on this topic!
@weslleyspereira I looked at i-CodeCNES and it seems it suggests a lot of changes which are probably a bit unnecessary, e.g. avoiding multiple RETURN statements in routines (see output below). I'll let it run on the entire codebase (it's pretty slow) and collect all types of warnings it produces. Then we can discuss which of those if any really matter.
I was already expecting these tools would suggest changes that would involve too much work on our side. Thanks for digging through it. It would be great to have a list of those suggestions. Then we could decide what we want to apply.
On the topic of code analysis, do we use Gfortran's sanitizers somewhere (ASAN, UBSAN, MSAN, TSAN)? If not, we (NAG) have some CMake modules which I could add to LAPACK's CMake system and enable those checks in the pipelines.
I believe we don't use any code sanitizer either. Please, feel welcomed to try them on! I recall we had a discussion about the usage of Valgrind tool with -fsanitize=address
in #551. Related PR #844.
icode
output oncaxpy.f
:[...]
Oh! This is a lot of information for a single file. I can imagine how slow the complete process can be. Thanks for sharing!
@weslleyspereira I tried some of the static analysis tools now and here are some of the results.
Pro: Free, opensource, easy to install and use Con: no summary of results
In the LAPACK sources there are mostly the same warnings but approx. 30 times as much. In my opinion there is nothing in here that is really a serious issue.
Pro: Easy to install and use, summary of results, free for non-commercial projects Con: Problems with f90 files, bit picky about the amount of source files passed to it (crashes if I try to analyze the entire repo at once)
382 occurrences of Unreliable test for equality of reals
4 occurrences of Unreachable or redundant statement
1698 occurrences of Unreliable test for equality of reals
17 occurrences of Argument is not used
216 occurrences of PARAMETER is not used
49 occurrences of Variable assigned a value but not used
227 occurrences of INTRINSIC functn name used for variable
1090 occurrences of Unreliable test for equality of reals
3 occurrences of Subprogram argument inconsistency
2 occurrences of Variable declared but not used
4 occurrences of Argument is not used
270 occurrences of PARAMETER is not used
43 occurrences of Variable assigned a value but not used
202 occurrences of INTRINSIC functn name used for variable
112 occurrences of Unreliable test for equality of reals
32 occurrences of Argument is not used
106 occurrences of INTRINSIC functn name used for variable
1100 occurrences of Unreliable test for equality of reals
26 occurrences of Variable declared but not used
120 occurrences of Argument is not used
40 occurrences of COMMON block is not used
29 occurrences of PARAMETER is not used
131 occurrences of Variable assigned a value but not used
114 occurrences of INTRINSIC functn name used for variable
I think the interesting warnings here are those about Unreachable or redundant statement
, Subprogram argument inconsistency
and INTRINSIC functn name used for variable
. The others are again very superficial and not really a security issue.
Pro: Nice summaries, a lot of features Con: Commercial, very complex and difficult to use
Notes
=====
No. Description Count Contrib
1271 Non-standard Fortran intrinsic(s) used as local identifier(s) 2 0
1273 Fortran auxiliary keyword used as identifier name. 152 0
1281 Fortran intrinsic name used as an array name 13 0
2323 Duplicate source code for the same sub-program name. 26 0
2447 No main program encountered 1 0
2493 INTENT declared OUT but argument is read: 4 0
4409 Unable to verify read access status of this argument 4 0
Warnings
========
No. Description Count Contrib
1269 Fortran keyword used as local identifier name. 58 58
2161 File renamed to avoid name conflict. 18 18
2345 Data type of PARAMETER does not match that of the expression 20 20
2517 ANSI FORTRAN 77 intrinsic used as a local identifier 27 27
3047 Fortran 90 intrinsic name used as an identifier 418 418
3085 Objects of .EQ., .NE. .GT. etc. are of different data types 174 174
3087 REAL or COMPLEX quantity tested for exact equality/inequality 3880 3880
3089 Comparison of REAL or COMPLEX values which differ in precision 1 1
3295 Parameter will not have the expected precision 18 18
3437 Mixed real or complex sizes in expression - loss of precision 2 2
Errors
======
No. Description Count Contrib
4415 INTENT(OUT/INOUT) argument - routine may write to an expression 14 140
Notes
=====
No. Description Count Contrib
1271 Non-standard Fortran intrinsic(s) used as local identifier(s) 50 0
1273 Fortran auxiliary keyword used as identifier name. 1208 0
1281 Fortran intrinsic name used as an array name 22 0
1867 Duplicate main program 17 0
2241 References are made to sub-programs which have not been read. 1 0
2323 Duplicate source code for the same sub-program name. 145 0
2449 Unused sub-programs encountered 1 0
2451 Unused primary files encountered 1 0
3921 This symbol is already declared to be SAVE or STATIC 5 0
Warnings
========
No. Description Count Contrib
1269 Fortran keyword used as local identifier name. 261 261
1711 Named COMMON block varies in size between sub-programs 16 16
1901 Input line longer than 72 characters. Line truncated. 1 1
2161 File renamed to avoid name conflict. 21 21
2185 White space was removed from identifier name or keyword. 1 1
2345 Data type of PARAMETER does not match that of the expression 11 11
2517 ANSI FORTRAN 77 intrinsic used as a local identifier 76 76
3047 Fortran 90 intrinsic name used as an identifier 102 102
3085 Objects of .EQ., .NE. .GT. etc. are of different data types 106 106
3087 REAL or COMPLEX quantity tested for exact equality/inequality 1480 1480
3089 Comparison of REAL or COMPLEX values which differ in precision 6 6
3295 Parameter will not have the expected precision 37 37
3437 Mixed real or complex sizes in expression - loss of precision 22 22
3493 Declarations follow executable statements 8 8
Errors
======
No. Description Count Contrib
1823 Error in use of '.' in number, operator or structure. 2 20
2025 Unable to interpret use of '.' or '%' character 2 20
3039 Data type of this function conflicts with prior use 32 320
Here again, the Fortran keyword used as local identifier name
might be problematic as well as Parameter will not have the expected precision
and Mixed real or complex sizes in expression - loss of precision
.
Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.
Raw result files (not all, some are too big):
fpt_TESTING.log fpt.log icode_BLAS_f90.log icode_BLAS_TESTING.log icode_BLAS.log icode_LAPACK_f90.log spag_BLAS_TESTING.log spag_BLAS.log spag_LAPACK_complex.log spag_LAPACK_TESTING_complex.log spag_LAPACK_TESTING.log spag_LAPACK.log
Sharing my point of view on i-CodeCNES:
This tool implements a set of coding rules defined by CNES and are mostly generic coding "good-practice rules". What we want to know is if rule XYZ implies in a security vulnerability or not. But, these rules cannot be directly mapped to known vulnerability systems, such as those defined by CWE, so it's harder to understand the security implications of violating a rule. I will try to give a few examples given the warnings found.
So, all of these rules are "good-practice rules" and there's no security harm in violating them if the problem does not reach a relevant operation in a relevant context, such as doing a wrong calculation when performing an authorization or entering a memory buffer overflow when manipulating user data. But, if is a relevant operation in a relevant context then the rule is important.
This is meant to explain how the warnings can represent serious issues, but really depends on the context the warning is being raised.
Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.
Based on icode_BLAS.log
posted earlier, I took a look at the function ZHERK
because the warnings for this function are printed first starting in line 148. The routine computes C = α A A^* + β C
for caller-provided scalars α, β and matrices A
, B
Diagnostic message | Line numbers | Comment |
---|---|---|
It's not allowed to compare float variables (X) with equality. | 246, 248, 263, 289, 293, 313, 317, 347, 357, 369, 379 | These warnings refer to comparisons of α and β with zero or one, respectively, for the obvious optimizations. |
It's not allowed to compare float variables (X) with equality. | 302, 326 | Entries of the matrix A are compared with zero to avoid unnecessary operations. |
The variable |
180, 181 | gfortran 12 does not complain and my Fortran is not good enough to see how the variables LDA , LDC , TRANSPOSE can be defined as parameters. |
Using more than five conditions in an expression is not allowed. | 241-242 | Code with line break removed: IF ((N.EQ.0) .OR. ((ALPHA.EQ.ZERO).OR.(K.EQ.0)).AND. (BETA.EQ.ONE))) RETURN |
Return code used in arithmetical statement. | 345, 355, 367, 377 | The return value of the intrinsic DCONJG is used in the expression TEMP = TEMP + DCONJG(A(L, I)) * A(L, J) . i-CodeCNES thinks the return value should be checked by default (see the description of COM.FLOW.CheckCodeReturn in its documentation). |
There is more than one exit in the function. | 242 | Early return when there is nothing to do beyond input validation. |
There is more than one exit in the function. | 278 | Early return for α = 0. |
There is more than one exit in the function. | 389 | The return statement in the last line of executable code. |
Parentheses are needed for readability. | 305, 330, 345, 360, 372, 377, 382 | These lines contain statements of the form TEMP = TEMP + DCONJG(A(L, I)) * A(L, J) or C(I, J) = x * y + beta * f(c(j, i)) . |
Mixed type COMPLEX with COMPLEX | 257, 295, 303, 320, 327, 348, 350, 380, 382 | Is this a i-CodeCNES bug? Its documentation says about COM.TYPE.Expression : "in a expression [...] all variables should have the same type". |
Mixed type COMPLEX with DOUBLE PRECISION | 259, 271, 273, 297, 307, 318, 328, 360, 370, 372 | IMO a false positive for LAPACK. |
The IF instruction shall finish with an ELSE after the last ELSE IF. | 322 | This warning concerns the if block checking the validity of each argument and setting INFO to the index of the first invalid argument. That is, I do not see the need for an ELSE branch here. |
Error in the following parameters: 'ZHERK ' variable belongs to parameter types forbidden when calling a function: a constant, an expression to be evaluated, a call to another function | 235 | Code: XERBLA('ZHERK ',INFO) |
The instruction RETURN(i) is not allowed. | 236, 242, 278, 389 | The statement in each of those lines is RETURN . |
COMPLEX*16 is not a basic type. Basic types are INTEGER, REAL, DOUBLE PRECISION, COMPLEX, LOGICAL and CHARACTER. | 184, 200 | This warning must be disabled for double-precision complex code. |
The i-CodeCNES documentation lists the different warnings but does not always clarify why they exist. For example, why is passing constants to a function (line 235) being warned about? Without this reasoning I cannot identify a single useful warning for ZHERK.
Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.
The i-CodeCNES documentation mentions three Fortran-specific warnings that are disabled by default but they seem useless for non-testing code since LAPACK neither opens files nor allocates memory.
Rule | Description |
---|---|
F77.BLOC.File | File access whould be done using OPEN and CLOSE instructions. |
F90.DESIGN.Free | Allocated memory should be free in the same conceptual level. |
F90.INST.Nullify | After deallocate, the use of NULLIFY into the same logical unit is mandatory DEALLOCATE (C, stat = iom ) NULLIFY ( C ) |
Hey! Do you already use a static code analysis tool? Also known as SAST.
SAST is used to identify security vulnerabilities in your source code. Vulnerabilities such as buffer overflow where attackers can modify the application execution by writing to memory. Different than Fuzzing, where you have to setup your test cases, SAST tools have their own set of test cases that they'll check against your code.
So, adding SAST helps keep your code safe from vulnerabilities, but I understand it comes along additional work to handle the reports.
Referencing here some SAST options I found for Fortran:
Additional Context
Hi! I'm Gabriela and I work on behalf of Google and the OpenSSF suggesting supply-chain security changes :)