Open RyanGlScott opened 3 years ago
I was originally hopeful that this could be a one-module fix, but things are rarely that simple. My first inclination was to modify crucible
's lookupAlias
function, which is responsible for throwing the "Unknown type alias
" error:
lookupAlias :: (?lc :: TypeContext, MonadError String m) => Ident -> m SymType
lookupAlias i =
case llvmAliasMap ?lc ^. at i of
Just stp -> return stp
Nothing -> throwError $ unwords ["Unknown type alias", show i]
If looking up an Ident
in the llvmAliasMap
fails, then simply consult the llvmMetadataMap
as a last resort... at least, that's what I thought at first. Unfortunately, as far as I can tell, LLVM's metadata doesn't provide you with enough information to faithfully construct an entire SymType
out of it (e.g., it doesn't tell you a struct's packedness, or the padding of a struct's fields). As a result, we'll likely need to do the heavy lifting in saw-script
's llvm_sizeof
function:
If liftMemType
fails to look up an alias type, then we could fall back to using the llvmMetadataMap
. The problem is: how do we know if liftMemType
fails specifically due to an unknown type alias? There are a number of different things that can cause liftMemType
to fail, but simply matching on Left err
doesn't distinguish between any of these different failure modes.
One possible way forward is to change the MonadError String m
constraints used in lookupAlias
(and elsewhere in crucible
) to something like MonadError TypeContextException m
, where:
-- | Describes what problem was encountered in a computation using a
-- 'TypeContext'.
data TypeContextException
= UnknownTypeAlias Ident
| NonMemTypeEncountered SymType
| NonRetTypeEncountered SymType
| SymTypeConversionErrors [Doc Void]
| TranslationError String
deriving Show
instance Exception TypeContextException where
displayException (UnknownTypeAlias i) = unwords ["Unknown type alias", show i]
displayException (NonMemTypeEncountered stp) = unlines ["Expected memory type", show stp]
displayException (NonRetTypeEncountered stp) = unlines ["Expected return type", show stp]
displayException (SymTypeConversionErrors edocs) = unlines (map show edocs)
displayException (TranslationError s) = s
Then llvm_sizeof
could match on Left (UnknownTypeAlias _)
to distinguish it from other forms of failure. In the former case, we would look up the debugging info in llvmMetadataMap
, and the latter case, we would simply propagate the error as llvm_sizeof
currently does.
One annoying wrinkle: how exactly do we look up the relevant debugging info in llvmMetadataMap
? I thought that llvm-pretty
's guessAliasInfo
function would be just the tool for the job, but the catch is that it returns an Info
. Info
has two issues:
Info
's Structure
constructor doesn't record its size, nor does Union
nor ArrInfo
. I suppose we could change these constructors to store the size, but that feels a bit ad hoc. (Why not store all other fields of DICompositeType
as well?)Info
to store the size, we'd only be able to do so for certain forms of DICompositeType
s. Notably, Info
doesn't have a constructor that corresponds to enum
s.There's enough of an impedance mismatch between Info
and what I need that it would likely be easier just to cargo-cult what guessAliasInfo
does, except making it return a DICompositeType
instead of an Info
. Either way, I'll need to change llvm-pretty
, since I would need to make use of certain functions from Text.LLVM.DebugUtils
that aren't currently exported.
One intermediate step we could take would be to allow the user to explicitly give alias definitions. Then, if the alias exists in the bitcode we could just check that it matches, but if it doesn't exist, we can just take it as given.
That does sound appealing. What sort of API would you envision for this feature? Something like llvm_declare_alias : String -> LLVMType -> TopLevel LLVMType
?
I was thinking lvm_declare_alias : String -> LLVMType -> TopLevel ()
, but yes, basically. Then it would be pretty easy to cut-an-paste, e.g.:
llvm_declare_alias "struct.foo" (llvm_type "{ i32 }");
I guess one wrinkle would be to make sure that recursive groups of aliases could be declared without causing errors.
Another use case for this feature is supporting union types, as Clang has an annoying habit of optimizing away aliases that are mentioned in unions. For instance, saw-script
's own test_llvm_union
test case will work when compiled without optimizations, but not with optimizations. Compare the bitcode at -O0
with Clang 10:
%struct.st = type { i32, %union.anon }
%union.anon = type { %struct.inc_2_st }
%struct.inc_2_st = type { i32, i32 }
%struct.inc_1_st = type { i32 }
Versus the bitcode at -O1
:
%struct.st = type { i32, %union.anon }
%union.anon = type { %struct.inc_2_st }
%struct.inc_2_st = type { i32, i32 }
In the optimized version, the %struct.inc_1_st
alias has been optimized away, which leaves SAW unable to resolve its fields:
$ ~/Software/saw-0.9.0.99-Linux-x86_64/bin/saw test.saw
[21:58:33.287] Loading file "/home/rscott/Documents/Hacking/Haskell/saw-script/intTests/test_llvm_union/test.saw"
[21:58:33.322] Verifying 'inc_1' using 'llvm_verify':
[21:58:33.416] Stack trace:
"llvm_verify" (/home/rscott/Documents/Hacking/Haskell/saw-script/intTests/test_llvm_union/test.saw:47:1-47:12):
"inc_spec" (/home/rscott/Documents/Hacking/Haskell/saw-script/intTests/test_llvm_union/test.saw:47:30-47:38):
"llvm_points_to" (/home/rscott/Documents/Hacking/Haskell/saw-script/intTests/test_llvm_union/test.saw:20:7-20:21):
Found struct field name: 'x'
in struct with name 'inc_1_st'.
However, the offset of this field found in the debug information could not
be correlated with the computed LLVM type of the setup value:
%struct.inc_1_st*
It would be convenient to be able to declare an alias for inc_1_st
to ensure that it can be used even if Clang optimizes it away.
Consider the following example:
When
test.c
is compiled without optimizations, the proof succeeds:If
test.c
is compiled with-O2
, however, the proof fails:This because with
-O2
, LLVM will optimize away the alias information forfoo
in the compiled bitcode. Compare the bitcode for the-O0
version, which contains%struct.foo = type { i32 }
:To the bitcode for the
-O2
version, which does not:Just because the bitcode is missing an alias doesn't mean we have to give up, however. The
-O2
bitcode actually does contain the size ofstruct foo
elsewhere in debugging information. See thesize: 32
bit below:I propose that in the event that SAW cannot resolve an alias, it should consult the debugging information as a fallback. This isn't guaranteed to always succeed, since it's possible that LLVM might also optimize away the debugging information. But for many cases, the info is right there, so we might as well use it.