Closed melg8 closed 2 years ago
Thanks, I'll take a look at the issues you've raised.
I get some really odd errors that I'm not sure how resolve, such as...
test_shared_message.cpp:279:92: runtime error: member call on address 0x7fffc1004180 which does not point
to an object of type 'reference_counted_message'
0x7fffc1004180: note: object is of type 'etl::ireference_counted_message'
reference_counted_message
is derived from etl::ireference_counted_message
.
ireference_counted_message
is a pure virtual base class.
The error is generated from just constructing a reference_counted_message
.
@jwellbelove can you provide on which branch/commit and with which compiler you get this particular error (and with which flag exactly)? So i can try reproduce it locally and further investigate?
So far i find similar issue triggered by test_indirect_vector.cpp:
include/etl/indirect_vector.h:457:30: runtime error: member call on address 0x7ffe516756f0 which does not point to an object of type 'etl::ivector<TestDataDC<std::__cxx11::basic_string<char> > *>'
0x7ffe516756f0: note: object is of type 'etl::vector_base'
fe 7f 00 00 d0 3b 5f 00 00 00 00 00 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 57 67 51
^~~~~~~~~~~~~~~~~~~~~~~
vptr for 'etl::vector_base'
will use it for reference.
The branch I am using is derived from master
.
I am compiling on Ubuntu 20.04
with GCC 10.3.0
TEST(test_reference_counted_pool_exceptions)
in test_shared_message.cpp
"So far i find similar issue triggered by test_indirect_vector.cpp:"
I have also seen that, and seems to be a similar problem, as etl::ivector
is derived from etl::vector_base
.
I have the GCC flags -fsanitize=address,undefined
@jwellbelove I try to minimize reproducible example, so far minimal reproducible looks like this:
#include "unit_test_framework.h"
#include "etl/indirect_vector.h"
template <typename T>
class IIndirectVector
{
protected:
IIndirectVector(etl::ivector<T*>& lookup_, etl::ipool& storage_)
: lookup(lookup_)
, storage(storage_)
{
// lookup_.clear(); // Triggers.
}
#if defined(ETL_POLYMORPHIC_INDIRECT_VECTOR) || defined(ETL_POLYMORPHIC_CONTAINERS)
public:
virtual
#else
protected:
#endif
~IIndirectVector()
{
// lookup.clear(); // Triggers.
storage.available(); // Doesn't trigger.
}
etl::ivector<T*>& lookup;
etl::ipool& storage;
};
template <typename T, const size_t MAX_SIZE>
class TestVector : public IIndirectVector<T>
{
public:
TestVector()
: IIndirectVector<T>(lookup_vector, storage_pool), test(lookup_vector)
{
test.clear(); // Doesn't trigger.
}
etl::vector<T*, 1UL> lookup_vector;
etl::ivector<T*>& test;
etl::pool<T, 1UL> storage_pool;
};
namespace
{
SUITE(test_indirect_vector)
{
struct SetupFixture
{
};
TEST_FIXTURE(SetupFixture, test_default_constructor)
{
etl::vector<int*, 10UL> lookup_vector;
etl::ivector<int*>& lookup = lookup_vector;
lookup.clear(); // Doesn't trigger.
TestVector<int, 1UL> data;
}
};
} // namespace
@jwellbelove I find that ETL_DECLARE_DEBUG_COUNT in base class of vector_base causes problem. Idk why so far. Second version of minimization:
#include "unit_test_framework.h"
#include "etl/debug_count.h"
namespace etl
{
class vector_base1
{
protected:
vector_base1(size_t max_size_)
{
}
virtual ~vector_base1()
{
}
ETL_DECLARE_DEBUG_COUNT // This triggers!
};
template <typename T>
class ivector1 : public etl::vector_base1
{
protected:
ivector1(T* p_buffer_, size_t MAX_SIZE)
: vector_base1(MAX_SIZE)
{
}
~ivector1()
{
}
public:
void clear()
{
}
};
template <typename T, const size_t MAX_SIZE_>
class vector1 : public etl::ivector1<T>
{
public:
vector1()
: etl::ivector1<T>(nullptr /*reinterpret_cast<T*>(&buffer)*/, MAX_SIZE_)
{
}
};
} // namespace etl
template <typename T>
class IIndirectVector
{
protected:
IIndirectVector(etl::vector1<T*, 1>& lookup_)
: lookup(lookup_)
{
// lookup_.clear(); // Triggers.
}
#if defined(ETL_POLYMORPHIC_INDIRECT_VECTOR) || defined(ETL_POLYMORPHIC_CONTAINERS)
public:
virtual
#else
protected:
#endif
~IIndirectVector()
{
lookup.clear(); // Triggers.
}
etl::vector1<T*, 1>& lookup;
};
template <typename T, const size_t MAX_SIZE>
class TestVector : public IIndirectVector<T>
{
public:
TestVector()
: IIndirectVector<T>(lookup_vector)
{
}
etl::vector1<T*, 1UL>
lookup_vector;
};
SUITE(test_indirect_vector)
{
struct SetupFixture
{
};
TEST_FIXTURE(SetupFixture, test_default_constructor)
{
TestVector<int, 1UL> data;
}
};
Thanks for the investigation work. That's really narrowed the issue down.
@jwellbelove what is interesting, if i replace debug_count with copy-paste definition, and just rename it to debug_count1 - issue disappear. Does main library build into lib and linked to tests, or it's just header only representation used in tests?
Edit: maybe i was wrong, will post final edition some time later.
@jwellbelove here is reproducible and you can check different ideas with different compilers. https://godbolt.org/z/aKqxWTj8K
@jwellbelove okay, i think, i found it. Not working: https://godbolt.org/z/q1jqKov4M Working: https://godbolt.org/z/vhnrGrM3q
So basically problem setup is here: https://github.com/ETLCPP/etl/blob/5468eb659c8b4ecdb3d08e8b8f0442c5a2549a48/include/etl/indirect_vector.h#L1422-L1426
And problem itself is here: https://github.com/ETLCPP/etl/blob/5468eb659c8b4ecdb3d08e8b8f0442c5a2549a48/include/etl/indirect_vector.h#L1321-L1324
And problem is in order of field initialization:
Initialization order
The order of member initializes in the list is irrelevant: the actual order of initialization is as follows:
1) If the constructor is for the most-derived class, virtual bases are initialized in the order in which they appear in depth-first left-to-right traversal of the base class declarations (left-to-right refers to the appearance in base-specifier lists)
2) Then, direct bases are initialized in left-to-right order as they appear in this class's base-specifier list
3) Then, non-static data member are initialized in order of declaration in the class definition.
4) Finally, the body of the constructor is executed
From here
So what happens is when initializing indirect_vector we at step 2) initialize etl::iindirect_vector
People already mentioned this kind of problem.
I think you can try to adopt some variant of existing solution for this problem from boost - or alternatively change constructors somehow to initialize them in proper order.
Btw, would really appreciate if you star my repo cit (: slowly try to make it more popular, and gather people's opinions about how to improve it.
Hope my investigation helps and you will find best fitting solution for this problem!
_So what happens is when initializing indirect_vector we at step 2) initialize etl::iindirect_vector base class, but we initialize them with values of lookup_vector and storagepool which will be initialized only at step 3) And that's an undefined behavior.
The base class is initialised with references to lookup_vector
and storage_pool
(which have had their storage defined at that point), which initialise local references in the base class. The base class constructor does not use them at all, so there is no undefined behaviour from calling un-constructed objects.
This is a technique (of calling the base class constructors with references to derived member resources) that is used throughout the ETL's containers and which the sanitizer appears to has no issues with in those cases.
I think the issue with reference counted messages and indirect vector is maybe a little more complex (though possibly related).
It may also be a false positive.
I am also finding the sanitizer errors in TEST(test_reference_counted_pool_exceptions)
hard to comprehend as the base class of etl::reference_counted_message
(which is etl::ireference_counted_message
) is pure virtual and is not passed any resources from the derived class.
The sanitizer gets upset at calling the overridden virtual function release()
(which uses *this
) after the etl::reference_counted_message
has been constructed.
The sanitizer has no issues with calling the overridden virtual functions that do not use *this
.
etl::reference_counted_message<Message1, etl::atomic_int> temp(message1, message_pool);
temp.get_reference_counter().get_reference_count(); // OK
temp.release(); // Sanitizer runtime error
@jwellbelove can you upload whole setup, which reproduce (for example in ci) test_reference_counted_pool_exceptions in some separate branch, i was getting error with vector, but when i enable compilation of only test_shared_message.cpp with clang 13 and -fsanitize=undefined from master branch i get it passing:
Success: 5 tests passed.
Test time: 0.00 seconds.
I'm using -fsanitize=address,undefined
@jwellbelove i've tried with both flags enabled, in this case still, clang thinks all ok:
Success: 5 tests passed.
Test time: 0.00 seconds.
Btw i have ETL_NO_STL off. should i change it to ON?
On the bright side - even if this is somehow a false positive - we could make a reproducible out of it and give it to clang/gcc developers - because, at least for vector case - it has persisted across different compilers (you've used gcc, and i used clang).
STL on or off makes no difference for me.
@jwellbelove okay, so that is what you were talking about?
I have some issues with my system (can't compile with gcc using gust this flag - get linking error about dlsym) but at least in ci - i can see that error now.
Already interesting that only clang 9 on ubuntu triggers.
Btw there maybe issue with your pipeline. It is named Clang 9 linux but from what i see uses gcc:
build-clang-9-linux-no-stl:
name: Clang-9 Linux - No STL
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04]
steps:
- uses: actions/checkout@v2
- name: Build
run: |
git fetch --tags
cmake -DBUILD_TESTS=ON -DNO_STL=ON -DETL_USE_TYPE_TRAITS_BUILTINS=OFF -DETL_USER_DEFINED_TYPE_TRAITS=OFF -DETL_FORCE_TEST_CPP03=OFF ./
gcc --version
make
@jwellbelove I've reduced test_reference_counted_pool_exceptions so far to this: https://godbolt.org/z/zG1fMzTb8 And version without any templates: https://godbolt.org/z/a1h7hY113
The non-template is a good minimal example of the issue.
I've been checking using clang 10.0.0 and I get a completely different set of issues.
It seems to have an intermittent 4/8 byte alignment error in the etl::unordered_multiset
tests.
@jwellbelove does that example speak to you in some way? https://godbolt.org/z/rPMs1xPPd I've manually added placement new from local buffer, and problem is gone.
So in case of placement new - object is de-allocated using release method, which calls destructor manually. But in case of stack based object - we get destructor called twice? first - with release method, second - at the end of scope? which mess-up vtable for object, so ubsan detects it as wrong kind of object? Illustration of this effect As mentioned here (stackoverflow answer) calling destructor twice is ub.
So i've tried this code:
#include "unit_test_framework.h"
#include "etl/fixed_sized_memory_block_allocator.h"
#include "etl/reference_counted_message_pool.h"
#include "etl/shared_message.h"
namespace {
constexpr etl::message_id_t MessageId1 = 1U;
constexpr etl::message_id_t MessageId2 = 2U;
struct Message1 : public etl::message<MessageId1> {
Message1(int i_) : i(i_) {}
~Message1() {}
int i;
};
struct Message2 : public etl::message<MessageId2> {
~Message2() {}
};
SUITE(test_shared_message) {
using pool_message_parameters =
etl::atomic_counted_message_pool::pool_message_parameters<Message1,
Message2>;
etl::fixed_sized_memory_block_allocator<
pool_message_parameters::max_size, pool_message_parameters::max_alignment,
4U>
memory_allocator;
etl::atomic_counted_message_pool message_pool(memory_allocator);
TEST(test_reference_counted_pool_exceptions) {
using pool_message_parameters =
etl::atomic_counted_message_pool::pool_message_parameters<Message1,
Message2>;
etl::fixed_sized_memory_block_allocator<
pool_message_parameters::max_size,
pool_message_parameters::max_alignment, 4U>
memory_allocator;
etl::atomic_counted_message_pool message_pool(memory_allocator);
etl::reference_counted_message<Message1, etl::atomic_int>* prcm;
CHECK_NO_THROW(prcm = message_pool.allocate<Message1>(6));
using rcm = etl::reference_counted_message<Message1, etl::atomic_int>;
alignas(rcm) unsigned char buf[sizeof(rcm)];
Message1 message1(6);
rcm* temp = new (buf) rcm(message1, message_pool);
try {
message_pool.release(*temp);
} catch (etl::exception e) {
CHECK_EQUAL(std::string("reference_counted_message_pool:release failure"),
std::string(e.what()));
}
}
}
} // namespace
And it passes tests.
The issue with the reference counted message test was that the release
function would always destruct the object, regardless of whether the allocator was the owner or not.
Fixed in 20.28.0
What happened
Hi, you've created a great library! But i discovered that source code of tests and maybe source code of library itself contains multiple instances of at least two kinds of problems:
How to reproduce
It can be easily reproduced by addition of "-fsanitize=address,undefined" to CMAKE_CXX_FLAGS on every supported compiler. As for example:
Or for single type of check:
You can checkout concrete example for gcc/clang from this commit
You can see failing results in github actions ci.
Reports example
Address 0x7ffe36665b90 is located in stack of thread T0 at offset 928 in frame
0 0x55db03f744a5 in (anonymous namespace)::Suitetest_callback_timer_interrupt::Testcallback_timer_interrupt_log_timer_calls::RunImpl() const (/home/runner/work/etl/etl/test/etl_tests+0xadcf4a5)