ceandrade / brkga_mp_ipr_cpp

The Multi-Parent Biased Random-Key Genetic Algorithm with Implict Path Relink - C++ version
Other
16 stars 6 forks source link

Unsigned underflow #4

Closed fdaniele85 closed 1 year ago

fdaniele85 commented 1 year ago

Using multi-threading, my code stucked during Path-Relinking. With a bit of debugging I found that the problem is the for on line 2816 of file brkga_mp_ipr.hpp: for(std::size_t i = 0; i < remaining_blocks.size() - 1; ++i, ++it_block_idx) {

remaining_blocks.size() is unsigned and if remaining_blocks is empty, than remaining_blocks.size() -1 is an underflow an on my machine it gives the maximum unsigned int.

I propose to use for(std::size_t i = 0; i + 1 < remaining_blocks.size(); ++i, ++it_block_idx) {

ceandrade commented 1 year ago

Thanks for catching this up @fdaniele85. I wonder why you get in such a situation. For that to happen, we must remove all candidates in line 2741-2745. But, this will invalid all iterators from that point (particularly it_block_idx in lines 2755-2756), which may cause even worse issues since we are touching uncharted memory.

So, changing the indexing, as you propose, will only mascarade deeper problems. We need to investigate that a little bit further.

So do you mind running a few tests? First, I would recommend running Valgrind to see if this code is accessing invalid memory (which I believe so).

Second, we may add a branch or skip on line 2746, finishing the execution. Something like that:

if(remaining_blocks.empty())
    break;

Please, let me know the results.

fdaniele85 commented 1 year ago

Dear @ceandrade, thanks for the answer. Line 2755 is not executed, given the continue at line 2744.

Executing valgrind is impossible without modification of the code, because it stucks. I added if (remaining_blocks.size() == 0) {break;} after line 2831 and the output is:

==9146== ==9146== HEAP SUMMARY: ==9146== in use at exit: 2,856 bytes in 6 blocks ==9146== total heap usage: 790,944 allocs, 790,938 frees, 21,722,690 bytes allocated ==9146== ==9146== 8 bytes in 1 blocks are still reachable in loss record 1 of 5 ==9146== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==9146== by 0x487D84C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==9146== by 0x489004A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==9146== by 0x487BFE1: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==9146== by 0x400647D: call_init.part.0 (dl-init.c:70) ==9146== by 0x4006567: call_init (dl-init.c:33) ==9146== by 0x4006567: _dl_init (dl-init.c:117) ==9146== by 0x40202E9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ==9146== by 0x17: ??? ==9146== by 0x1FFEFFFF4E: ??? ==9146== by 0x1FFEFFFF57: ??? ==9146== by 0x1FFEFFFF5A: ??? ==9146== by 0x1FFEFFFF5C: ??? ==9146== ==9146==

It does not find problems on for in line 2816. Probably because it_block_idx refers to memory that is still valid (remaining_blocks has not been deallocated). Am I wrong?

ceandrade commented 1 year ago

I understand that it_block_idx will point to invalid memory because remaining_blocks is a set<>, when the most C++ implementations, is a custom red-black tree. So, when you delete an element, a node from the tree is deleted. However, it may not be the case here.

So, with if (remaining_blocks.size() == 0) {break;}, does it work in your case (without playing with loop counters)?

If so, you can submit a pull request, if you like, so you become a contributor. Otherwise, I can fix by myself, and give you the credit. :-)

ceandrade commented 1 year ago

Thanks for the pull request @fdaniele85!