goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
172 stars 72 forks source link

Apron: Invariants for variables of type `unsigned long` wrong #1542

Open michael-schwarz opened 1 month ago

michael-schwarz commented 1 month ago

Consider the following program extracted from libaco which is one of the programs I am considering for #1417:

#include<pthread.h>
pthread_mutex_t mtx;
int g;

void *pmain(void *pthread_in_arg )
{
  unsigned long tmp_gl_ct ;

  pthread_mutex_lock(& mtx);
  tmp_gl_ct = g;
  pthread_mutex_unlock(& mtx);
  return ((void *)0);
}

int main()
{
  pthread_t t1;
  pthread_create(& t1, 0, pmain, 0);
  pthread_join(t1, 0);
  return 0;
}

Creating a YAML witness from it (via ./goblint --conf conf/traces-rel.json --set ana.activated[+] apron --set ana.path_sens[+] threadflag --set ana.relation.privatization mutex-meet-tid-cluster12 --enable allglobs -v --enable witness.yaml.enabled notlibaco.c --set witness.yaml.path notlibaco.yml) yields

- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: 2bcbb0e1-df8c-4b8d-8a00-ea9aa25ab8b1
    creation_time: 2024-07-10T12:42:46Z
    producer:
      name: Goblint
      version: heads/michael-schwarz-dissertation-0-g970a99913-dirty
      command_line: '''./goblint'' ''--conf'' ''conf/traces-rel.json'' ''--set'' ''ana.activated[+]''
        ''apron'' ''--set'' ''ana.path_sens[+]'' ''threadflag'' ''--set'' ''ana.relation.privatization''
        ''mutex-meet-tid-cluster12'' ''--enable'' ''allglobs'' ''-v'' ''--enable''
        ''witness.yaml.enabled'' ''notlibaco.c'' ''--set'' ''witness.yaml.path'' ''notlibaco.yml'''
    task:
      input_files:
      - notlibaco.c
      input_file_hashes:
        notlibaco.c: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
      data_model: LP64
      language: C
  location:
    file_name: notlibaco.c
    file_hash: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
    line: 12
    column: 3
    function: pmain
  location_invariant:
    string: (0LL - (long long )g) + (long long )tmp_gl_ct >= 0LL
    type: assertion
    format: C
- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: 13ab3ea7-f159-4c45-95ff-765acf3c0ec7
    creation_time: 2024-07-10T12:42:46Z
    producer:
      name: Goblint
      version: heads/michael-schwarz-dissertation-0-g970a99913-dirty
      command_line: '''./goblint'' ''--conf'' ''conf/traces-rel.json'' ''--set'' ''ana.activated[+]''
        ''apron'' ''--set'' ''ana.path_sens[+]'' ''threadflag'' ''--set'' ''ana.relation.privatization''
        ''mutex-meet-tid-cluster12'' ''--enable'' ''allglobs'' ''-v'' ''--enable''
        ''witness.yaml.enabled'' ''notlibaco.c'' ''--set'' ''witness.yaml.path'' ''notlibaco.yml'''
    task:
      input_files:
      - notlibaco.c
      input_file_hashes:
        notlibaco.c: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
      data_model: LP64
      language: C
  location:
    file_name: notlibaco.c
    file_hash: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
    line: 12
    column: 3
    function: pmain
  location_invariant:
    string: (long long )g + (long long )tmp_gl_ct >= 0LL
    type: assertion
    format: C
- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: d33de886-9d25-4455-a287-08e516c8a0ed
    creation_time: 2024-07-10T12:42:46Z
    producer:
      name: Goblint
      version: heads/michael-schwarz-dissertation-0-g970a99913-dirty
      command_line: '''./goblint'' ''--conf'' ''conf/traces-rel.json'' ''--set'' ''ana.activated[+]''
        ''apron'' ''--set'' ''ana.path_sens[+]'' ''threadflag'' ''--set'' ''ana.relation.privatization''
        ''mutex-meet-tid-cluster12'' ''--enable'' ''allglobs'' ''-v'' ''--enable''
        ''witness.yaml.enabled'' ''notlibaco.c'' ''--set'' ''witness.yaml.path'' ''notlibaco.yml'''
    task:
      input_files:
      - notlibaco.c
      input_file_hashes:
        notlibaco.c: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
      data_model: LP64
      language: C
  location:
    file_name: notlibaco.c
    file_hash: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
    line: 12
    column: 3
    function: pmain
  location_invariant:
    string: 0LL - (long long )g >= 0LL
    type: assertion
    format: C
- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: 9484e898-6315-48a5-807f-f86dd2bd2ede
    creation_time: 2024-07-10T12:42:46Z
    producer:
      name: Goblint
      version: heads/michael-schwarz-dissertation-0-g970a99913-dirty
      command_line: '''./goblint'' ''--conf'' ''conf/traces-rel.json'' ''--set'' ''ana.activated[+]''
        ''apron'' ''--set'' ''ana.path_sens[+]'' ''threadflag'' ''--set'' ''ana.relation.privatization''
        ''mutex-meet-tid-cluster12'' ''--enable'' ''allglobs'' ''-v'' ''--enable''
        ''witness.yaml.enabled'' ''notlibaco.c'' ''--set'' ''witness.yaml.path'' ''notlibaco.yml'''
    task:
      input_files:
      - notlibaco.c
      input_file_hashes:
        notlibaco.c: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
      data_model: LP64
      language: C
  location:
    file_name: notlibaco.c
    file_hash: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
    line: 12
    column: 3
    function: pmain
  location_invariant:
    string: (long long )g >= 0LL
    type: assertion
    format: C
- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: 894f19db-b1ef-4982-83c0-cf4404861ab1
    creation_time: 2024-07-10T12:42:46Z
    producer:
      name: Goblint
      version: heads/michael-schwarz-dissertation-0-g970a99913-dirty
      command_line: '''./goblint'' ''--conf'' ''conf/traces-rel.json'' ''--set'' ''ana.activated[+]''
        ''apron'' ''--set'' ''ana.path_sens[+]'' ''threadflag'' ''--set'' ''ana.relation.privatization''
        ''mutex-meet-tid-cluster12'' ''--enable'' ''allglobs'' ''-v'' ''--enable''
        ''witness.yaml.enabled'' ''notlibaco.c'' ''--set'' ''witness.yaml.path'' ''notlibaco.yml'''
    task:
      input_files:
      - notlibaco.c
      input_file_hashes:
        notlibaco.c: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
      data_model: LP64
      language: C
  location:
    file_name: notlibaco.c
    file_hash: e36c8d4402977c57e85e6fe02221ae759457e969afc37d9852c8c500d836c19f
    line: 12
    column: 3
    function: pmain
  location_invariant:
    string: (long long )tmp_gl_ct >= 0LL
    type: assertion
    format: C

However, subsequent self-validation with ./goblint --conf conf/traces-rel.json --set ana.activated[+] apron --set ana.path_sens[+] threadflag --set ana.relation.privatization mutex-meet-tid-cluster12 --enable allglobs -v --set witness.yaml.validate notlibaco.yml notlibaco.c yields the message that only 2/5 invariants can be confirmed.

[Info][Witness] witness validation summary:
  confirmed: 2
  unconfirmed: 3
  refuted: 0
  error: 0
  unchecked: 0
  unsupported: 0
  disabled: 0
  total validation entries: 5

One of the invariants that cannot be re-confirmed, is (long long )tmp_gl_ct >= 0LL. This is correct, as this invariant may not hold (if we assume uninitialized variables are top), as long long and unsigned long have the same bit-length but different ranges.

Wrong reasoning I used earlier:

michael-schwarz commented 1 month ago

It does not have anything to do with logic to exclude temporaries, the problem is still there after renaming the variable.

michael-schwarz commented 1 month ago

Adding the assertion as a __goblint_check also can not be verified, so it seems the shortcoming may be on the generation side?

sim642 commented 1 month ago

Maybe this has to do with the long long casts that we blindly add to all Apron invariants?

michael-schwarz commented 1 month ago

Nevertheless, it should still hold even after making this cast.

michael-schwarz commented 1 month ago

Actually claiming that this holds is in fact wrong! long long and long have the same length on LP64 machines, so this supposed invariant actually isn't one!

michael-schwarz commented 1 month ago

I now have the following minimal program

#include<pthread.h>
int *b;
pthread_mutex_t e;

void main() {
    int g;
    int a;
    b = &g;

    pthread_mutex_lock(&e);
}

where some 4 invariants cannot be shown after locking e, such as (4294967294LL - (long long )a) - (long long )g >= 0LL. Enabling the interval domain makes us able to show the assertions.

michael-schwarz commented 1 month ago

However, the bounds seem to be correct, they result from interval arithmetic. If I don't assign the address of g to b, verification also succeeds without intervals.

sim642 commented 1 month ago

This minimal program locks a mutex in single-threaded mode which should be fine, but maybe we handle something incorrectly in such case? Or is the locking there just as a place for after-lock invariant generation?

michael-schwarz commented 1 month ago

It also misbehaves if the program starts a thread right at the beginning. I just figured it might be easier to debug if there's no concurrency happening.

I haven't tried whether we have the same issue if I generate witnesses at all program points and remove the mutex, I'll try that tomorrow.

michael-schwarz commented 1 month ago

The problem also occurs when dumping invariants at all program points and removing the locking of the mutex.

michael-schwarz commented 1 month ago

The last remaining issue can be observed in this program:

#include<pthread.h>
int *b;
pthread_mutex_t e;

void* other(void* arg) {
    return NULL;
}

void main() {
    pthread_t t;
    pthread_create(&t, NULL, other, NULL);
    int g = 8;
    int a;
    if(a) {
        g = 4383648;
    }

    b = &g;

    // Invariant generated: (2143100000LL + (long long )a) + (long long )g >= 0LL
    pthread_mutex_lock(&e);
}

This invariant is an interesting case: At first glance it seems wrong, as it appears as if the case where g is 8 seems to violate the invariant.

However, the invariant is actually correct as one can see by case distinction over a:

If we destroy the relationship about a before, this invariant is no longer produced.

michael-schwarz commented 1 month ago

It also seems to be related to escaping somehow:

To establish the invariant, we attempt to show a contradiction with (2143100000LL + (long long )a) + (long long )g#in < 0L

in the state we have a#994+g#993+2143100000 >= 0 which refers to a local copy of g somehow, and not the flow-insensitive invariant which appears to be stored in g for which we know it is equal with g#in (-g+g#in#>=0 and g-g#in#>=0) such a property does not hold.

Complete state:

%%% gurki: assert_constraint (2143100000LL + (long long )a) + (long long )g#in < 0LL 0 -_i,n (2143100000 +_i,n a#994 +_i,n g#in#) > 0
%%% gurki: assert_constraint st: [|a#994+2147483648.>=0; -a#994+2147483647.>=0; -a#994+g+2147483639.>=0;
  a#994+g+2147483640.>=0; g-8.>=0; -a#994-g+2151867295.>=0;
  a#994-g+2151867296.>=0; -g+4383648.>=0; -a#994+g#993+2143099999.>=0;
  a#994+g#993+2143100000.>=0; -g+g#993+4383640.>=0; g+g#993-16.>=0;
  g#993-8.>=0; -a#994-g#993+2151867295.>=0; a#994-g#993+2151867296.>=0;
  -g-g#993+8767296.>=0; g-g#993+4383640.>=0; -g#993+4383648.>=0;
  -a#994+g#in#+2147483639.>=0;
  a#994+g#in#+2147483640.>=0; -g+g#in#>=0;
  g+g#in#-16.>=0; -g#993+g#in#+4383640.>=0;
  g#993+g#in#-16.>=0; g#in#-8.>=0;
  -a#994-g#in#+2151867295.>=0;
  a#994-g#in#+2151867296.>=0; -g-g#in#+8767296.>=0;
  g-g#in#>=0; -g#993-g#in#+8767296.>=0;
  g#993-g#in#+4383640.>=0; -g#in#+4383648.>=0|] (env: [|
0> __daylight:int; 1> __timezone:int; 2> a#994:int; 3> daylight:int;
4> g:int; 5> g#993:int; 6> g#in#:int; 7> timezone:int|])