Z3Prover / z3

The Z3 Theorem Prover
Other
10.42k stars 1.47k forks source link

"String" sort.. #364

Closed LeventErkok closed 8 years ago

LeventErkok commented 8 years ago

This isn't quite an issue, but rather an info request.

It appears Z3 very recently introduced String as a sort. (Judging by the commit messages, I'm guessing @NikolajBjorner is behind this work?) This broke our tools, since we were using that name as the uninterpreted name ourselves, which obviously doesn't work anymore. But that obviously is not Z3's problem!

Rather, I'd like to ask exactly what operations are supported for this sort, what the literals look like, etc. I don't think this is an SMTLib sanctioned sort/logic yet, but I very much like the idea of having native support for Strings inside the SMT-solver and would like to find out more. My brief googling didn't reveal any documents or resources to that effect. I'd love to be an early adapter of this logic and play around with it if you can provide some basic info at least.. Very exciting.

NikolajBjorner commented 8 years ago

There have for a very long time been requests to support strings (and sequences/regular expressions). The format is similar to QF_S format used in CVC4, which is not yet documented at a level for broad consumption. However, some users seem to have taken this up and now ask why Z3 doesn't understand "String" and parse string literals.

Z3 now parses ASCII string literals (other character formats will have to wait) and lets you represent formulas over this type. The SMT-LIB2.5 spec specifies string literals (and in some way supposes a corresponding sort).

If you don't use interpreted functions on String, then you should be able to get away with removing your declaration of "String" and then rely on the built-in version. If you are really angry (or others are), then I will hack the parser to ignore duplicate String declarations so that we can serve both those that use built-in strings and those who don't.

Z3's default behavior is to load all logics if no logic is set. It should not load strings if a supported logic is set using the command set-logic, for example (set-logic QF_ABV). This behavior is perhaps not SMT-LIB proper, but seems rather convenient to me.

A native solver for strings (sequences) is in progress (this week) and I am checking it in directly as either users have to deal with "String" being reserved going forward and they might as well if they integrate latest builds, or users can leverage SMT-LIB front-end for strings as is in the various stages of support. Other efforts to support strings have been taken in projects. The front-end support for strings can be leveraged by other than the native solver.

LeventErkok commented 8 years ago

@NikolajBjorner Thanks! I'm perfectly happy for Z3 stealing the name String. it's easy for me to work-around that. I'm more excited about the actual support for the type. Please do not change the current behavior on my behalf.

[Specifying a strict logic is a can-of-worms really: This has been discussed before, and I think SMTLib's way of specifying logics is just not sustainable. So, I merely let Z3 pick the logic in most cases, as it tends to do the "right" thing, instead of trying to figure out exactly which prescribed logic I need. But that's a separate discussion.]

I've skimmed through http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.5-draft.pdf but didn't see anything regarding the QF_S nor what operations are available on them. Can you point me to a specific section of that? (Or some other resource?)

NikolajBjorner commented 8 years ago

QF_S is not in that document it only talks about the format of string literals.

 <spec constant> ::= <numeral> | <decimal> | <hexadecimal> | <binary> | <string>

on page 8. QF_S is used in benchmarks accompanying papers by CVC4. The formats used in CVC4 is an evolution, with several changes, of an old workshop paper suggesting a format for sequences and regular expressions http://research.microsoft.com/en-us/people/nbjorner/smtlibstr.pdf. I don't know of other details.

LeventErkok commented 8 years ago

Thanks @NikolajBjorner

Should I expect everything in that paper to work with Z3? Or is it still under development?

wintersteiger commented 8 years ago

Since @NikolajBjorner didn't respond: Yes, this is still under active development, but if you're planning to take a closer look, nice small test cases would definnitely be appreciated!

LeventErkok commented 8 years ago

@wintersteiger Definitely in my radar!

[Closing this ticket..]

NikolajBjorner commented 8 years ago

The set of constructs definitely diverges from this initial proposal. Furthermore, the names are different: there are SMT-LIB2 naming conventions currently used in CVC4 that we parse as well: for example, it is called str.len instead of seq-length. Z3 lets you form sequences over arbitrary sorts, not just characters, so in fact str.len is the name that may be used if the argument is a string, and more generally we use seq.len for sequences. Thus, there are str.len, str.at, str.contains, str.++ to be friendly for the useful special cases of strings and there is seq.len, seq.at, seq.contains, seq.++, etc to use more reasonable names according to what is really supported. A basic procedure is now checked in, but requires tuning before I would say it is ready for creating applications around. At the very least it gives users some way to formulate constraints over strings natively. I will update the online tutorial on rise4fun.com with syntax and examples when it is better tuned.

gabriel-fallen commented 8 years ago

Hello! I'm going to use Z3 with Strings support for my research. Where can I find up-to-date documentation/specification/examples on Strings (Seq) and RegEx theory and language? Thanks in advance!

NikolajBjorner commented 8 years ago

Thanks for your interest and I would appreciate unit and regression tests. It is still churning and I will add documentation when it is stabilized and I have had a chance to tune it reasonably. The syntax is fairly compatible with CVC4's. Unit tests provide example usages.

https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/string-ops.smt2 https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/string-concat.smt2 https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/string-eval.smt2 https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/string-simplify.smt2 https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/string-literals.smt2 https://github.com/Z3Prover/z3test/blob/master/regressions/smt2/re.smt2

Besides supporting string operations, I also support sequences over arbitrary types. Strings are sequences of bit-vectors. The syntax for sequence operations is "seq.len" instead of "str.len" etc. and "seq.extract" instead of "str.substr" (a sequence does not have a substring). You can declare a sequence (of integers) as follows:

    (declare-const x (Seq Int))
gabriel-fallen commented 8 years ago

Thanks a lot! I'm very new to Z3 codebase so I didn't know about z3test project. I cannot promise unit or regression tests though...

gabriel-fallen commented 8 years ago

At the moment I'm trying to build and check a regex for GUIDs: "[A-F\d]{8,8}-[A-F\d]{4,4}-[A-F\d]{4,4}-[A-F\d]{4,4}-[A-F\d]{12,12}"

Here is my code:

(declare-fun guid-re () (RegEx String))
(assert (= guid-re
    (let ((hexdigit (re.union (re.range "A" "F") (re.range "0" "9"))))
        (let ((first ((_ re.loop 8 8) hexdigit))
              (second ((_ re.loop 4 4) hexdigit))
              (third ((_ re.loop 4 4) hexdigit))
              (fourth ((_ re.loop 4 4) hexdigit))
              (fifth ((_ re.loop 12 12) hexdigit))
              (dash (str.to.re "-")))
        (re.++ first dash second dash third dash fourth dash fifth)))))

(declare-fun guid-str () String)
(assert (= guid-str "21EC2020-3AEA-4069-A2DD-08002B30309D"))

(assert (not (str.in.re "asdfghjk" guid-re)))

(assert (str.in.re guid-str guid-re))

But I'm getting this output:

$ z3 -v:5 guid-test.smt2
(smt.searching)
unknown
(error "line 35 column 10: model is not available")

Can you tell what's wrong? Or how can I debug it further?

NikolajBjorner commented 8 years ago

There is no support for symbolic regular expressions. You can inline the equations as definitions and Z3 will understand (in this case just do ground evaluation).

 (define-fun guid-re () (RegEx String) 
   (let ((hexdigit (re.union (re.range "A" "F") (re.range "0" "9"))))
      (let ((first ((_ re.loop 8 8) hexdigit))
          (second ((_ re.loop 4 4) hexdigit))
          (third ((_ re.loop 4 4) hexdigit))
          (fourth ((_ re.loop 4 4) hexdigit))
          (fifth ((_ re.loop 12 12) hexdigit))
          (dash (str.to.re "-")))
    (re.++ first dash second dash third dash fourth dash fifth))))

(define-fun guid-str () String "21EC2020-3AEA-4069-A2DD-08002B30309D")

(assert (not (str.in.re "asdfghjk" guid-re)))

(assert (str.in.re guid-str guid-re))
(check-sat)
gabriel-fallen commented 8 years ago

Thanks a lot! Now it works like magic. :)