Open kquick opened 3 years ago
n.b., the llvm-disasm helpfully shows the 0x3F8457AE140000000
constant in register %148
to be 1.0e-2
I think I'm encountering the same issue, or at least very similar one. To reproduce, compile the following file:
double a = 0;
void b() {
for (int c = 0; c < 2; c++)
a++;
if (a)
a = 7;
}
int main() {
b();
return 0;
}
With:
$ clang -O1 test.c -S -emit-llvm -fno-discard-value-names
$ clang -O1 test.c -c -emit-llvm -fno-discard-value-names
This will produce the following bitcode with Clang 10:
Attempting to parse this with llvm-pretty-bc-parser
fails on the fcmp une double %inc, 0.000000e+00
instruction:
λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@b","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 4 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 35, bsData = 4294967292}),FieldLiteral (BitString {bsLength = 5, bsData = 0}),FieldLiteral (BitString {bsLength = 5, bsData = 8}),FieldLiteral (BitString {bsLength = 5, bsData = 14})]}"})
The code which parses FUNC_CODE_INST_CMP2
is located here:
Comparing that with the corresponding code in LLVM, I'm skeptical that the code in llvm-pretty-bc-parser
is implemented correctly. In particular, the LLVM code parses the fast-math flags, whereas the llvm-pretty-bc-parser
code tries to skip over the fast-math flags somewhere in the middle.
Indeed, llvm-pretty-bc-parser
can successfully parse fcmp
instructions that lack fast-math flags, but it will fail if has fast-math flags. This can easily be seen by compiling the following example with and without the -ffast-math
flag:`
double a = 0;
int main() {
if (a) {
a++;
}
return 0;
}
If you compile without -ffast-math
:
$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names
You will get this test.ll
file:
And llvm-pretty-bc-parser
can parse test.bc
without issue, since the fcmp
instructions lack fast-math flags. If you repeat the same steps with the -ffast-math
flag, however:
$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names -ffast-math
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names -ffast-math
You will get this test.ll
file:
Note that we now have an fcmp
instruction with a fast
flag. As a consequence, llvm-pretty-bc-parser
fails to parse it:
λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@main","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 3 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 5, bsData = 1}),FieldLiteral (BitString {bsLength = 5, bsData = 7}),FieldLiteral (BitString {bsLength = 5, bsData = 14}),FieldLiteral (BitString {bsLength = 10, bsData = 254})]}"})
To add a wrinkle in all of this, the example in https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/158#issuecomment-910958621 demonstrates a situation where an fcmp
instruction in an .ll
file seemingly lacks fast-math flags, but the corresponding .bc
file does have fast-math flags. This can be verified by running llvm-bcanalyzer-10 test.bc -dump
, which shows an INST_CMP2
with four operands (one for the comparison type, two for the argument values, and one for the fast-math flags) instead of the usual three:
<INST_CMP2 op0=4294967292 op1=0 op2=8 op3=14/>
I'm not quite sure why this happens, but either way, the culprit is the presence of fast-math flags in the .bc
file.
I'm seeing the following with the current master
:
%149 = fcmp ogt float %145, %148, !dbg !DILocation(line: 62, column: 12, scope: !256754, inlinedAt: !256755)
Does this mean the problem is fixed?
I believe it is fixed, but I never figured out how to test against PX4_Autopilot, so I left this issue open as a reminder to do so—see https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/161#issuecomment-920886753. (Is the code you're testing from PX4_Autopilot?)
I was encountering multiple issues with the PX4_Autopilot code base (using build-bom extracted BC files). It was definitely encountering the fcmp
issue that is fixed above, but when I patched around that issue, it ran into other problems. Since it's a stream of bits, it's entirely possible that my workaround was insufficient to get things back on track and this was the only issue, but I was unable to use anything later than LLVM9 on that code base and successfully parse it, so I would recommend repeating that experiment to see how things stand now with the above fix in place.
I've been unable to even compile PX4_Autopilot with Clang to obtain .bc
files, so I don't know how to test this myself. Can you describe how you did so?
I can confirm that, after fixing some bugs in the handling of fast math flags, I also read the correct operand in a bitcode file obtained via LLVM 10.
I need to fix some other bugs to be able to read the bitcode file obtained from LLVM 12.
@RyanGlScott I am using a nix flake provided by @kquick to get bitcode files from PX4_Autopilot. To quote Kevin:
The way it works is that the
build-bom generate-bitcode BLDCMD
runs BLDCMD but it watches for clang compilations and when it sees one, it re-issues the clang compilation with options to generate LLVM bitcode. It then places the generated LLVM bitcode in a tar file and then adds that tarfile to the ELF file (.o or the exe) in a special section ".llvm_bitcode". Thebuild-bom extract-bitcode
is looking for that section to retrieve the tarfile, from which it can get the llvm bitcode file. The reason it uses a tarfile is that tarfiles can be concatentated together and the result is also a valid tarfile. So linking multiple .o files will concatenate their sections, which gives you a a tarfile with all the bitcodes in it.
I have some candidate code (PX4_Autopilot) that is having parser decode failures with multiple symptoms:
1) Failure to decode an
fcmp
operation value. The operation values defined are 0-15, but the value that llvm-pretty-bc-parser is attempting to resolve is 24. 2) ignoring the above by defaulting toftrue
, I'm seeing invalid operand values for the instruction.Here's the llvm-dis output:
for llvm-pretty-bc-parser, the llvm-disasm has problems on
%149
, where the default toftrue
still shows:Note the second operand is decoded as
%147
whereas llvm decodes to%148
.It's unknown if there are other parsing problems, this one just happened to fail the parser due to the lookup in (1) above and therefore be detectable.