Open danielsn opened 3 years ago
If we are adding this then it would also be good to have these for Ada for the https://github.com/diffblue/gnat2goto/ front-end which is currently also using the C
mode.
The expected pattern for this (Rust or Ada) is to set up a separate executable, with specialised command-line options. The executable then registers the relevant languages using the register_language
API.
Look at jbmc
as an exemplar.
Both of them use the json_symtab
language which is already supported.
I guess more generally; for languages that are using the json_symtab
interface, what should mode
be set to? gnat2goto
(and I think RMC
) both lie and set the mode to "C".
a. I'm with @martin-cs on this one: json_symtab
shouldn't be a 'language', but it should have 'mode' property.
b. The issue is that register_language
registers a front-end implementation for handling certain input files, which is a slightly different concept than 'language'.
c. Having a separate executable is preferable if language-specific options are required or the command line is adapted to seamlessly fit into the language-specific environment (like jbmc
supporting the command line syntax of java
), but this is somewhat orthogonal.
But back to points 1, 2, 3:
exprt
(like @kroening has recently done for the remainder operations) and then translate them to the right primitives in the frontend.goto_check
, which requires 1 and c. This could be avoided by inserting them in the frontend - likely depends on the particular checks. Language-specific cases in goto-check
are not the right mechanism IMO.Taking into account the comments and discussion above, would a reasonable choice be as follows.
To support the handling of rust in displays and mixed input (i.e. a goto binary comprising of a mixture of C
and rust
:
json_symtab
"language" to have a mode
property.json_symtab
mode
s for ID_C
and ID_Rust
.Then for semantic differences between the languages and different language checks:
cbmc
's capabilities to handle these new semantics (as mentioned by @peterschrammel and done by @kroening for other semantics). These can be done for the semantics desired, but not tied to a specific language or language mode (i.e. one could use them in any language).gnat2goto
and rmc
) to handle the insertion of correct checks or goto
programs that use the above new semantics.Comments/feedback very welcome!
@TGWDB I agree with splitting this in two (getting the correct mode
and then how and where you do language-specific things) and with the general principle that mode
should be correct (i.e. we should have ID_C
, ID_cpp
, ID_java
, ID_rust
, ID_ada
, etc.). However may I raise a few concerns with your plan:
The mode is compared to the output of the id()
method:
https://github.com/diffblue/cbmc/blob/10ddca06abb2c5b13b59737650dd2cfd2565fe72/src/langapi/mode.cpp#L45
which for json_symtab
is:
https://github.com/diffblue/cbmc/blob/10ddca06abb2c5b13b59737650dd2cfd2565fe72/src/json-symtab-language/json_symtab_language.h#L41
it is not clear to me what this should be set to.
mode
is used in two ways, it is either checked directly in language specific bits of code (actually pretty rare; searching for ID_java
or ID_cpp
shows this) or it is used by src/langapi/mode.h
to get the relevant languaget
mostly for output via src/langapi/language_util.h
.
Sounds good.
I get the principle and the whole point of a modular framework is so that we can have lots of different executables, so that is good. I have a slight concern about how we handle mixed language goto-programs, which is a desirable goal for both Ada and Rust ( @danielsn ? ).
As far as I can see there are two ways of resolving this and https://github.com/diffblue/cbmc/issues/6223 .
Simple fix approach : inherit from json_symtab_languaget
to create rust_languaget
and ada_languaget
which have id()
s of ID_rust
and ID_ada
respectively and implement from_type()
and from_expr()
etc. symtab2gb
then loads all of these language front-ends and possibly the C front-end as well for good measure.
More involved but fuller solution : split languaget
into two, one part for front-end, parsing and type checking and one part for language-specific support and output. Implement just the second part for ID_rust
and ID_ada
. Load the language-specific support and output for all languages in all tools. The front-end part stays the same in all tools, including symtab2gb
which only needs json_symtab
support.
HTH
I think I agree with most of the points raised by @peterschrammel @TGWDB and @martin-cs - in particular, I do quite like the idea of splitting languaget
into explicit "front-end/reporting" and "output/reporting/mid-end/back-end" - that I think enables json_symtab
to remain a fairly focused, language-agnostic "intermediate representation" rather than having bits of ad-hoc language specific knowledge bolted on.
@TGWDB any feel for a time-frame on this one? If it is soon then I won't merge #6233
@martin-cs I also like the idea of splitting into front-end and back-end parts.
@TGWDB sorry to hassle but you are assigned as the owner for this. I need to https://github.com/diffblue/cbmc/issues/6223 resolved ASAP. https://github.com/diffblue/cbmc/pull/6233 will do it but I really don't want to merge it because it is a work-around. If you are going to resolve this in the next 24-48 hours then I won't merge #6233 .
Also @chrisr-diffblue and @peterschrammel for general interest.
@martin-cs I cannot promise resolution in 24-48 hours so I propose you go ahead with #6233.
@TGWDB thanks for the fast response.
RMC is a new Rust front-end for CBMC. Currently, it uses the
C
mode in the symbol-table. We propose adding aRust
mode to symbols. For now, this mode would have the same semantics asC
, but would allow us to distinguish rust code from C code. This is particularly important as Rust support linking C code using a FFI, which we need to support in RMC.Benefits
Design considerations
Rust is not the only language that would benefit from this. Any new language front-end will likely see the same issues we are, and could benefit from a principled mechanism to alleviate them.
Links and documentation
https://github.com/diffblue/cbmc/blob/1ab5de1ac0893c7ca94ffe5463e2c7c4ee781266/src/util/symbol.h#L49
Currently, CBMC appears to have the following langauges:
https://github.com/diffblue/cbmc/blob/48893287099cb5780302fe9dc415eb6888354fd6/src/cbmc/cbmc_languages.cpp#L25-L35