GaloisInc / mir-json

Plugin for rustc to dump MIR in JSON format
Apache License 2.0
8 stars 2 forks source link

Constants #6

Closed sweirich closed 5 years ago

sweirich commented 5 years ago

This ADT definition

  pub struct Scalar64(pub [u64; 5]);

produces this output, with a strange expression for the size.

"adts": [
   {
     "name": "::Scalar64[0]",
     "variants": [
       {
         "ctor_kind": {
           "kind": "Fn"
         },
         "discr": {
           "index": 0,
           "kind": "Relative"
         },
         "fields": [
           {
             "name": "::Scalar64[0]::0[0]",
             "substs": [],
             "ty": {
               "kind": "Array",
               "size": {
                 "kind": "Const",
                 "ty": {
                   "kind": "Uint",
                   "uintkind": {
                     "kind": "usize"
                   }
                 },
                 "val": "Unevaluated(DefId(0/1:22 ~ scalar[8787]::Scalar64[0]::0[0]::{{constant}}[0]), []):usize"
               },
               "ty": {
                 "kind": "Uint",
                 "uintkind": {
                   "kind": "u64"
                 }
               }
             }
           }
         ],
         "name": "::Scalar64[0]::{{constructor}}[0]"
       }
     ]
   }
 ]
atomb commented 5 years ago

There's a skeleton of how to address this in commit f685b419, but it needs some lifetime tweaks before it'll compile (so it's in a comment for now).

spernsteiner commented 5 years ago

As of 0d5fcd7a6eb687c73abf1efa314941028b14f307, constants in types are fully evaluated before producing the JSON. All ty::Const nodes should now have a literal such as "123usize" in their val field, instead of the previous "Unevaluated(...):usize".

sweirich commented 5 years ago

Thanks. One last detail. In this input:

mod constants {
    pub(crate) const L: [u64;1] = [ 1 ];
}
fn f(_w : u64 ) -> u64 {
    constants::L[0] 
}

I'm getting the following output for the reference to the constant array in the body of f:

                  "rhs": {
                    "kind": "Use",
                    "usevar": {
                      "data": {
                        "literal": {
                          "kind": "Const",
                          "ty": {
                            "kind": "Array",
                            "size": {
                              "kind": "Const",
                              "ty": {
                                "kind": "Uint",
                                "uintkind": {
                                  "kind": "usize"
                                }
                              },
                              "val": "1usize"
                            },
                            "ty": {
                              "kind": "Uint",
                              "uintkind": {
                                "kind": "u64"
                              }
                            }
                          },
                          "val": "ByRef(AllocId(1), Allocation { bytes: [1, 0, 0, 0, 0, 0, 0, 0], relocations: Relocations(SortedMap { data: [] }), undef_mask: UndefMask { blocks: [255], len: Size { raw: 8 } }, align: Align { abi_pow2: 3, pref_pow2: 3 }, mutability: Immutable, extra: () }, Size { raw: 0 }):[u64; 1]"
                        },
                        "ty": {
                          "kind": "Array",
                          "size": {
                            "kind": "Const",
                            "ty": {
                              "kind": "Uint",
                              "uintkind": {
                                "kind": "usize"
                              }
                            },
                            "val": "1usize"
                          },
                          "ty": {
                            "kind": "Uint",
                            "uintkind": {
                              "kind": "u64"
                            }
                          }
                        }
                      },
                      "kind": "Constant"
                    }
                  }
                },

Can you make val above more structured?

spernsteiner commented 5 years ago

I've looked into this a bit more, and I'm not sure providing a more-structured val is the way to go.

For constants appearing in types, using val is fine, since the value is always a ConstValue::Scalar. But the result of evaluating an array constant is a ConstValue::ByRef, which uses a very low-level representation. It closely matches the way a static would be represented in the final object file: a sequence of unstructured bytes, plus a table of relocations to handle references to other statics. That sounds to me like it will be a real pain to handle, both for mir-verifier and for the underlying solver.

The alternative is to work with the Unevaluated representation, which is a reference to (essentially) a function that computes and returns the value of the constant. This is the ::constants[0]::L[0] "function" in the MIR for your example. I think for consts, a use of the constant could be translated to a call to that function. For statics, we could allocate space for the static, and then hopefully there's a convenient way to call its initializer to fill it in prior to the start of main.

In short, I think we should:

  1. Continue using val as-is for array sizes in types
  2. Replace uses of consts with calls to the constant's initializer function
  3. For statics, initialize them by running the initializer function before main

I think this will be easier than trying to back out structured data from the flattened ConstValue::ByRef representation.

spernsteiner commented 5 years ago

I went ahead and updated the representation of constants to provide both the evaluated and unevaluated forms:

{
    "initializer": {
        "def_id": "::VAL[0]",
        "substs": []
    },
    "int_val": "11",
    "kind": "Const",
    "ty": {
        "kind": "Uint",
        "uintkind": {
            "kind": "u32"
        }
    }
}

The int_val field is present only for constants of integer type, and contains the value of the constant. (It's provided as a string because Rust integer literals can exceed the precision of the JSON number type.) Eventually, we'll probably want a similar float_val field for floats, but I haven't implemented that yet.

The initializer field is present only for references to const items (not for literals), and contains a reference to the fn that computes the constant's value.

sweirich commented 5 years ago

I've updated mir-verifier with this change and the int_val component seems to work for ints, bools, and chars (and function pointers remain the same). I can also access the initializer for array constants. I don't think we need to do anything special for tuple constants. However, I'm still not getting enough information from mir-json for floats and string constants. The float_val field you mention above should work.

However, Strings are missing info. For example, for this RHS

    let s = "hello";

I get something like this:

"data": {
                        "literal": {
                          "kind": "Const",
                          "ty": {
                            "kind": "Ref",
                            "mutability": {
                              "kind": "MutImmutable"
                            },
                            "ty": {
                              "kind": "Str"
                            }
                          }
                        },
                        "ty": {
                          "kind": "Ref",
                          "mutability": {
                            "kind": "MutImmutable"
                          },
                          "ty": {
                            "kind": "Str"
                          }
                        }
                      },
                      "kind": "Constant"
                    }
spernsteiner commented 5 years ago

I've updated mir-verifier with this change and the int_val component seems to work

I think you left this change out of your most recent commit. GaloisInc/mir-verifier@e4a2baad9fbc4f91d45287349279cd47039d823a says "compatibility with mir-json update", but the diff only adds a couple of tests.

I'm still not getting enough information from mir-json for floats and string constants

I've added support for floats, strings, and bytestrings. Constants can now have one of these additional fields, depending on their type:

"float_val": "3.14",
"str_val": [104, 101, 108, 108, 111],       // "hello"
"bstr_val": [104, 101, 108, 108, 111],      // b"hello"

Since mir-json decides which val to include based only on the type of the constant, the bstr_val field may also show up on non-bytestring constants of the appropriate type (&[u8; _]), such as &[1_u8, 2, 3]. I think this should be harmless, but let me know if it causes any problems.

sweirich commented 5 years ago

On May 1, 2019, at 2:25 PM, epdtry notifications@github.com wrote:

I've updated mir-verifier with this change and the int_val component seems to work

I think you left this change out of your most recent commit. GaloisInc/mir-verifier@e4a2baa says "compatibility with mir-json update", but the diff only adds a couple of tests.

Sorry. Forgot to push the next commit that includes the actual fix. I've pushed now (along with a few other bug fixes.)

I'm still not getting enough information from mir-json for floats and string constants

I've added support for floats, strings, and bytestrings. Constants can now have one of these additional fields, depending on their type:

"float_val": "3.14" ,

"str_val": [104, 101, 108, 108, 111], // "hello" "bstr_val": [104, 101, 108, 108, 111], // b"hello" Since mir-json decides which val to include based only on the type of the constant, the bstrval field may also show up on non-bytestring constants of the appropriate type (&[u8; ]), such as &[1_u8, 2, 3]. I think this should be harmless, but let me know if it causes any problems.

Thanks. I'll check this out now.

sweirich commented 5 years ago

I'm closing this now because I've got string & bytestring literals to work. Thanks.