c-cube / qcheck

QuickCheck inspired property-based testing for OCaml.
https://c-cube.github.io/qcheck/
BSD 2-Clause "Simplified" License
340 stars 37 forks source link

memory usage problems #287

Open Robotechnic opened 1 week ago

Robotechnic commented 1 week ago

Hello, I started using qcheck for a project and it's a great quick checking framework but I have a problem with the memory consumption on some tests which require to manipulate huge data structures or a lot of data:

open QCheck2

let generator = Gen.(
  (0 -- 500) >>= fun n ->
  tup2 (return n) (list_size (return n) (array_size (1 -- 200) int))
)

let test = Test.make ~count:1000 ~name:"test" generator (fun _ ->
  true)

let () = exit @@ QCheck_runner.run_tests [test;test;test]

This is a minimal example and my problem is that it uses 1.1gb of RAM if I run it like that:

let () = ignore @@ QCheck_runner.run_tests [test]
let () = ignore @@ QCheck_runner.run_tests [test]
let () = exit @@ QCheck_runner.run_tests [test]

It takes the same amount of time to run but uses 438Mb of RAM, which is what I would expect from a test. Here the example is dumb but in my case, I'm running 500 tests that run fine because they don't use much RAM but I have 30 tests that uses a lot of RAM and running them all at once consumes more than 10gb of RAM.

Robotechnic commented 1 week ago

I forgot to gave some precisions about my config: It's done using ocaml 4.14.2 and check 0.21.3

c-cube commented 1 week ago

A quick calculation shows that in the worst case, each array could be 500 200 8B = 800kB of data. On average it might be 250 100 8B = 200kB. If you run 1000 tests, that means you might consume 200kB * 1000 = 200MB, plus some other overhead from qcheck itself. That's not insignificant, although I'm not sure how it gets to 1.1GB even being allocated (perhaps a lot of closures and temporary values used to produce the values?).

I don't think it's super shocking, 500*200 is fairly big for nested arrays :)

Maybe run Gc.full_major() at the beginning of some tests? Or with a low random probability?

ghilesZ commented 1 week ago

Just passing by and interested by this. Here are two ideas.

It would look roughly like that : (I'm unsure how to do the same with QCheck2 since the Gen.t type is not public anymore)

open QCheck

let array_memo_size =
  let cache = Hashtbl.create 1000 in
  fun size gen (st:Random.State.t) ->
  let actual_size = size st in
  match Hashtbl.find_opt cache actual_size with
  | None ->
     let a = Array.init actual_size (fun _ -> gen st) in
     Hashtbl.add cache actual_size a;
     a
  | Some a -> Array.map_inplace (fun _ -> gen st) a; a

let generator = Gen.(
  (0 -- 500) >>= fun n ->
  tup2 (return n) (list_size (return n) (array_memo_size (1 -- 200) int))
)
Robotechnic commented 1 week ago

My problem is not really the memory usage of a 1000 tests this in itself is not a problem. My real problem is the fact that if I pass to the QCheck runner a list of tests it uses a lot more memory than if I split the tests and run them separately: If I run the tests like this:

let () = exit @@ QCheck_runner.run_tests [test;test;test]

Here is the output of the time command:

time ./_build/default/bin/main.exe -v
random seed: 454529569
================================================================================
success (ran 3 tests)
./_build/default/bin/main.exe -v   106,41s  user 0,43s system 99% cpu 1:47,35 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                1151 MB
page faults from disk:     0
other page faults:         19732

And if I run the tests like this:

let () = ignore @@ QCheck_runner.run_tests [test]
let () = ignore @@ QCheck_runner.run_tests [test]
let () = exit @@ QCheck_runner.run_tests [test]

Here is the output of the time command:

time ./_build/default/bin/main.exe -v
random seed: 115565881
================================================================================
success (ran 1 tests)
================================================================================
success (ran 1 tests)
================================================================================
success (ran 1 tests)
./_build/default/bin/main.exe -v   108,05s  user 0,47s system 99% cpu 1:48,92 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                438 MB
page faults from disk:     0
other page faults:         26705

I think that it is strange because running that shouldn't change how the program behaves from what I understood about QCheck_runner.

Memoizing the array to reuse them is a good idea, but it might not solve my problem completely as it can't be applied to most of my tests.

esope commented 1 week ago

Another test, that shows that the memory consumption grows linearly with the value of the count argument passed to run_tests :

open QCheck2

let count = int_of_string Sys.argv.(1)
let generator = Gen.(array_size (return 2000) int)
let test = Test.make ~count ~name:"test" generator (fun _ -> true)
let () = exit @@ QCheck_runner.run_tests [ test ]
$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 10000
random seed: 124874579
================================================================================
success (ran 1 tests)
max memory: 300392 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 20000
random seed: 390831488
================================================================================
success (ran 1 tests)
max memory: 583168 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 30000
random seed: 69779182
================================================================================
success (ran 1 tests)
max memory: 881676 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 40000
random seed: 155692075
================================================================================
success (ran 1 tests)
max memory: 1159708 kB

Is there a memory leak somewhere in run_tests?

jmid commented 3 days ago

Thank you both for the reports - both with nice reproducible examples! :pray: Based on them I naturally got nerd-sniped and offer a fix in #288. :nerd_face:

esope commented 3 days ago

Excellent! Thank you so much @jmid !