ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

Fix memory leak #583

Closed rwj11 closed 4 years ago

rwj11 commented 4 years ago

valgrind --leak-check=full ./run inputfile="sim.toml" finds a leak at simulate.cc:25, i.e. that the object created at

part = new PART(data,model,poptree);

is lost at the end of the function.

github-actions[bot] commented 4 years ago

Heads up @chrispooley @ianhinder - the "CoronaPMCMC" label was applied to this issue.

ianhinder commented 4 years ago

Chris is about to remove PMCMC entirely from the code, so this will likely go away. I'll leave the issue open and we can confirm that it goes. He's going to push that today.

ianhinder commented 4 years ago

Chris has removed PMCMC and I've merged his changes into dev. Would you mind rerunning valgrind to confirm that the memory leak has gone away?

rwj11 commented 4 years ago

No, just moved to simulate.cc:29. The dev branch also crashes nastily for me because of

model.cc: In member function ‘double MODEL::calcprobin()’:
model.cc:870:24: warning: control reaches end of non-void function [-Wreturn-type]
  870 |  vector <unsigned int> cst, kst;
      |                        ^~~

with free(): double free detected in tcache 2 -- adding a return value to MODEL::calcprobin() fixes this, but maybe this needs to be a value which means something!

chrispooley commented 4 years ago

Hi,

Thanks for pointing that out. No that function shouldn't return a value. I'll change it in the code.

Cheers,

Chris

rwj11 commented 4 years ago

Thanks for the confirmation. I've actually just fixed this for myself with a pull request for dev, as I need it working so I can do other things...

ianhinder commented 4 years ago

Any ideas why it didn't crash in CI? https://github.com/ScottishCovidResponse/CoronaPMCMC/runs/853627383?check_suite_focus=true Maybe just differences in stack content/memory?

rwj11 commented 4 years ago

So far as I read https://www.godbolt.org/z/SRg4So it looks like it may fall through into the next routine (!?!) -- the cleanup & return code after .L3 doesn't appear after .L12. This looks like a very nice example of undefined behaviour leading to insanity...

ianhinder commented 4 years ago

The fix to the valgrind problem is almost certainly to add

delete mbpchain

to the end of simulatedata. The code returns to main() after this, and doesn't do anything else with that object.

rwj11 commented 4 years ago

That's certainly what is leaking. However: a) it's generally better to use RAII rather than loose new/delete; b) it's simpler & more efficient to create the MBPCHAIN object on the stack rather than the heap. It's possible, though, that something called in simulatedata is keeping hold of a reference to some component of the MBPCHAIN object. In which case any of these options would cause the code to crash -- so that will need to be checked first.

ianhinder commented 4 years ago

Done.