acts-project / vecmem

Vectorised data model base and helper classes.
https://acts-project.github.io/vecmem/
Mozilla Public License 2.0
19 stars 13 forks source link

Segfault in Resizable Vector Buffer #223

Closed beomki-yeo closed 1 year ago

beomki-yeo commented 1 year ago

Ignore this if you think it is the duplicate of #95. The vector buffer with zero size doesn't work. Following is the minimal code for reproducing the problem

TEST_F(cuda_containers_test, zero_size) {

    vecmem::cuda::device_memory_resource device_resource;

    vecmem::data::vector_buffer<int> buff0(10, 0, device_resource);

    // OK
    ASSERT_EQ(buff0.capacity(), 10);
    // Not OK
    ASSERT_EQ(buff0.size(), 0); // Seg fault

    vecmem::data::vector_buffer<int> buff1(10, 10, device_resource);

    // OK
    ASSERT_EQ(buff1.capacity(), 10);
    // OK
    ASSERT_EQ(buff1.size(), 10);
}
Segmentation fault (core dumped)
beomki-yeo commented 1 year ago

BTW, the codes work with cuda::managed_resource

krasznaa commented 1 year ago

:confused: This is a feature Beomki, not a bug... :stuck_out_tongue:

First off, you should always call vecmem::copy::setup(...) on resizable buffers to make sure that the variable in "device accessible memory" that holds the size of the buffer, would be initialised to 0.

And I sort of already gave the answer in my previous statement. Just like how you can't access the contents of a vector buffer that uses a "device memory resource" from the host, you can also not access its size. The size variable sits in device memory in such a setup. So of course the code would crash.

With a managed memory resource the size variable sits in a piece of memory that's accessible by both the host and the device. So that works.

If you need to get the size of a buffer, you need to use vecmem::copy::get_size(...). This exact use case that you point out here is why that function exists in the first place. :wink:

Finally: I don't know why your last line in the example wouldn't crash. :confused: That seems to be some fluke. :confused: You could check what size_ptr() returns from those buffers, that may explain why for buff1 that pointer is not causing a segmentation fault...

krasznaa commented 1 year ago

Ahh, I even understand now why you didn't see a crash in the second case.

You see,

vecmem::data::vector_buffer<int> buff1(10, 10, device_resource);

actually creates a non-resizable buffer. :confused:

https://github.com/acts-project/vecmem/blob/main/core/include/vecmem/containers/impl/vector_buffer.ipp#L29

Which is some super misleading behaviour... :frowning: So... about time to address this as part of #221.

beomki-yeo commented 1 year ago

And I sort of already gave the answer in my previous statement. Just like how you can't access the contents of a vector buffer that uses a "device memory resource" from the host, you can also not access its size. The size variable sits in device memory in such a setup. So of course the code would crash.

Then, accessing the size in host side is OK with view type object? I still see the crash due to vector_view::size in copy operator: (See the line 247 of copy.ipp)

    // A sanity check.
    if (from_view.size() > to_view.size()) {
        std::ostringstream msg;
        msg << "from_view.size() (" << from_view.size()
            << ") > to_view.size() (" << to_view.size() << ")";
        throw std::length_error(msg.str());
    } 
beomki-yeo commented 1 year ago

Just in case you need to reproduce the error:

TEST_F(cuda_containers_test, jagged_vector_copy) {

    vecmem::host_memory_resource host_resource;
    vecmem::cuda::device_memory_resource device_resource;

    vecmem::data::jagged_vector_buffer<int> buff(
        std::vector<std::size_t>{1, 1}, device_resource, &host_resource);
    m_copy.setup(buff);

    vecmem::jagged_vector<int> vec{&host_resource};
    vec.resize(1);
    vec[0].resize(1);

    m_copy(buff, vecmem::get_data(vec), vecmem::copy::type::device_to_host);
}
unknown file: Failure
C++ exception with description "from_view.size() (2) > to_view.size() (1)" thrown in the test body.

Following is OK:

TEST_F(cuda_containers_test, jagged_vector_copy) {

    vecmem::host_memory_resource host_resource;
    vecmem::cuda::device_memory_resource device_resource;

    vecmem::data::jagged_vector_buffer<int> buff(
        std::vector<std::size_t>{2, 2}, device_resource, &host_resource);
    m_copy.setup(buff);

    vecmem::jagged_vector<int> vec{&host_resource};
    vec.resize(2);
    vec[0].resize(2);
    vec[1].resize(2);

    m_copy(buff, vecmem::get_data(vec), vecmem::copy::type::device_to_host);
}