Closed westwind2013 closed 6 years ago
A further checking indicates other similar bugs relating to the use of malloc.
Thanks very much for the report! I may not have a chance to address it until I get back from a conference on 15 January.
In the meantime, I'm curious what compiler you're using, as obviously this isn't an issue we have encountered with any of those we use (mainly various versions of gcc, with either mpich2 and mvapich MPI implementations).
In a standards-compliant compiler I would expect the two possibilities you write in each case to be handled identically. The style currently in the code (dereferencing the pointer rather than repeating the type) is typically advocated in order to improve the maintainability of the code. See here for one example, which also suggests moving the sizeof to the front. That may be worth testing, especially considering the apparent integer arithmetic issue suggested by your other report. This should look something like
Twist_Fermion **src = malloc(sizeof **src * Norder);
and so on.
Thank you for your reply. Moving the sizeof to the front is good. But I am worried that the coding style as below
_TwistFermion src = malloc(sizeof src * Norder);
still has a bug in it. As src is a pointer to data type _"TwistFermion *". When we do the malloc(), we hope to malloc a memory space of size "sizeof (src) Norder" instead of "sizeof (*src) Norder" --- the former indicates Norder elements of data type _"TwistFermion *" while the latter indicates Norder elements of data type _"TwistFermion".
Btw, I am actually working on a project that aims to provide an automated testing tool for MPI programs. I use this program as a target to do some analysis and happen to find these bugs. I don't know well enough the program actually.
_Norder elements of data type "TwistFermion *" is exactly what we want for the RHMC algorithm, which involves Norder
source vectors, each of which is malloc'd with sites_on_node
elements a few lines further down. In the end src
is an array of Norder
pointers to arrays of sites_on_node
Twist_Fermions, with the individual structs accessed via src[N][i]
for the components of source vector N
associated with lattice site i
.
Anyway, it sounds like moving the sizeof to the front resolves the problem you encountered, so I quickly did that in commit e2d120b and will close this issue as well. Feel free to reopen if problems remain.
If we want to add "Twist_Fermion *", should we use
Twist_Fermion *src = malloc(sizeof src * Norder);
instead of
Twist_Fermion src = malloc(sizeof src * Norder);
?
Okay, I'm finding the dereferencing hard to parse, so I scrapped it for the sake of readability. Commit 53adcdc essentially implements your original suggestion. Valgrind seems happy, at least in serial.
That's great. To be honest, I don't know much about physics, and thus just look the program from the perspective of the programmer.
Thank you for your work:)
Hi there!
In my recent use of this program, I found three segmentation fault bugs due to the inappropriate use of memory allocation routine malloc(). Below I list a verified fix for each:
Line 131 in susy/4d_Q16/susy/update_o.c _TwistFermion *src = malloc(Nroot sizeof(**src)); should be changed to _Twist_Fermion *src = malloc(Nroot sizeof(TwistFermion *));
Line 13 in susy/4d_Q16/susy/update_o.c _TwistFermion *psim = malloc(Nroot sizeof(psim)); should be changed to _Twist_Fermion psim = malloc(Nroot sizeof(Twist_Fermion ));_
Line 44 in susy/4d_Q16/susy/congrad_multi.c _TwistFermion *pm = malloc(Norder sizeof(**pm)); should be changed to _Twist_Fermion *pm = malloc(Norder sizeof(TwistFermion *));
Thank you!