angr / claripy

An abstraction layer for constraint solvers.
BSD 2-Clause "Simplified" License
290 stars 94 forks source link

Extract args are unintuitive #502

Open twizmwazin opened 1 month ago

twizmwazin commented 1 month ago

Description

The args to extract aren't very intuitive: (high (inclusive), low (inclusive), base). This seems like we copied it straight from z3, which did the same thing: https://z3prover.github.io/api/html/namespacez3py.html#a40e9429ef16134a6d9914ecdc2182e8c

What's even better though is that z3 later just overloaded this function for Strings and Sequences, but when they did that they must have been in a clearer state of mind, as for that variant the args are completely different: (base, index, length). I think we should copy those args for BVs too.

I guess this is more intuitive if you are reading the little-endian form of the BV, ex Extract(7, 0, BVV(0xabcd. 16)) == BVV(0xcd, 8). Though even in this case it doesn't feel very consistent, and the logic in get_bytes reflects this.

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

zardus commented 1 month ago

It's definitely a hold-over from Z3, but I would advise against changing Extract, as Z3 is the "industry standard" and that conceptual compatibility is important. Instead, what if we create a synonym to Extract, and have it make more sense?

On Fri, Sep 20, 2024 at 3:11 PM Kevin Phoenix @.***> wrote:

Description

The args to extract aren't very intuitive: (high (inclusive), low (inclusive), base). This seems like we copied it straight from z3, which did the same thing: https://z3prover.github.io/api/html/namespacez3py.html#a40e9429ef16134a6d9914ecdc2182e8c https://urldefense.com/v3/__https://z3prover.github.io/api/html/namespacez3py.html*a40e9429ef16134a6d9914ecdc2182e8c__;Iw!!IKRxdwAv5BmarQ!cXWDPouh_eUC_cgobFrABAUrCFHBr-DbLh7vf1D4Q7OxF1DSX6vbwlMHOIZpnq4NlZC4HFJssLTCV44kHKpUSHL4eLGY$

What's even better though is that z3 later just overloaded this function for Strings and Sequences, but when they did that they must have been in a clearer state of mind, as for that variant the args are completely different: (base, index, length). I think we should copy those args for BVs too.

I guess this is more intuitive if you are reading the little-endian form of the BV, ex Extract(7, 0, BVV(0xabcd. 16)) == BVV(0xcd, 8). Though even in this case it doesn't feel very consistent, and the logic in get_bytes reflects this. Steps to reproduce the bug

No response Environment

No response Additional context

No response

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/angr/claripy/issues/502__;!!IKRxdwAv5BmarQ!cXWDPouh_eUC_cgobFrABAUrCFHBr-DbLh7vf1D4Q7OxF1DSX6vbwlMHOIZpnq4NlZC4HFJssLTCV44kHKpUSCu-Eu6w$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA2LHF27TPUXMQI7O4HQ5ATZXSMPBAVCNFSM6AAAAABOS5R5DCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUZTSNZVG4YTIMQ__;!!IKRxdwAv5BmarQ!cXWDPouh_eUC_cgobFrABAUrCFHBr-DbLh7vf1D4Q7OxF1DSX6vbwlMHOIZpnq4NlZC4HFJssLTCV44kHKpUSB_zk1XQ$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ltfish commented 1 month ago

Let's not change it as it's going to impact too much existing code, much of which is not written by us. The ship has sailed at this point.

twizmwazin commented 1 month ago

I disagree with the general idea that it is too late to change something, holding on to tech debt because it is old leads to bad software that is hard to learn, hard to maintain, makes usage generally unpleasant and eventually kills a project when something better comes along.

But I think this is easy enough to type check at runtime that we could maintain compatibility for a while and give people advanced notice. I could change Extract to either support (bv, offset, size) or the current args, issuing a deprecation warning for the current args. And then after a year or when the rust implementation is ready, we drop the compatibility then, ideally marked with a major version release and a migration guide.

ltfish commented 1 month ago

FYI, the definition of Extract arguments comes from SMT-LIB2: https://stp.readthedocs.io/en/latest/smt-input-language.html

It would be nice to keep compatibility with the “industry standard.” Supporting two combinations of arguments is a good way forward; deprecating the current one is wrong IMO. Especially since it’s not “counter-intuitive” to me after working with z3 and other solvers for so many years.

twizmwazin commented 1 month ago

Alright yeah if Z3 is also consistent with SMT-LIB2 I think I'm in agreement with you. I'll see about adding a second set of args and maintaining both.