UCL / STIR

Software for Tomographic Image Reconstruction
http://stir.sourceforge.net/
Other
104 stars 89 forks source link

Use contiguous Arrays more #1438

Closed KrisThielemans closed 1 month ago

KrisThielemans commented 1 month ago
KrisThielemans commented 1 month ago

@markus-jehl I'm still hunting for an elusive speed-up. Maybe this one does help a bit...

markus-jehl commented 1 month ago

This merge broke something in the reconstruction. Projections still work, but when I do the initial reconstruction it comes out full of NaN‘s in the FoV. Haven‘t had time to look into the changes yet, but maybe this is also what impacts you on the psmr branch (since I also got a test failure on the MacOS build on my fork after pulling this merge).

KrisThielemans commented 1 month ago

ah. sorry.

This PR is quite innocent really. Essentially the only change is in ProjDataInMemory::copy_data_from_buffer (and the to version), where it uses a new specialisation of fill_from and copy_to for Array. That checks that if it's a contiguous array, we'll fall back to a std::copy using float * as opposed to full iterators, as the former should be faster (any good compiler will call memmove or similar).

There are some other formatting changes as I got rid of #ifdef STIR_TOF, which gives a lot of "noise" due to reformatting sadly.

I've added tests for all this, but I suppose none of our tests do a recon with ProjDataInMemory though.

KrisThielemans commented 1 month ago

@markus-jehl I cannot find a problem, nor reproduce it. Are you sure it is this PR?

Suggested debugging steps:

  1. disable the CopyFill<Array<>> specialisation (just comment out this)
  2. if step 1 works, then re-enable and replace the if (stir_array.is_contiguous()) statements in those lines with if(false).
  3. if step 1 didn't work, revert the ProjDataInMemory::copy_data_from_buffer (and to) to use std::copy directly (really shouldn't do anything as CopyFill should then fall-back to the same std::copy anyway)
  4. if step 3 didn't work, then revert ProjDataInMemory completely (it'd have to be something in the STIR_TOF edits I guess)

thanks for your help!

markus-jehl commented 1 month ago

I'm fairly certain it is this one: I went back to each PR in my fork and this is the one that first displayed the issue.

Will go through your suggested debug steps today!

markus-jehl commented 1 month ago

The problem is the first if(stir_array.is_contiguous()). I noticed that we're not actually checking if some other process has already obtained full access: https://github.com/UCL/STIR/blob/31e730eeb4dd5d2fb49c3d2f404ed7bc02e45115/src/include/stir/Array.inl#L283

What is the purpose of _full_pointer_access, actually? It seems to be never checked and since it's only used to read data, not modify them it shouldn't be required. Still, I don't understand why std::copy(beg, beg + stir_array.size_all(), iter); gives issues while std::copy(stir_array.begin_all(), stir_array.end_all(), iter); is fine...

KrisThielemans commented 1 month ago

The problem is the first if(stir_array.is_contiguous()).

which one is the first? L60?

I noticed that we're not actually checking if some other process has already obtained full access

that's true.

What is the purpose of _full_pointer_access, actually? It seems to be never checked and since it's only used to read data, not modify them it shouldn't be required.

For get_data_ptr, we do check if access is on/off https://github.com/UCL/STIR/blob/31e730eeb4dd5d2fb49c3d2f404ed7bc02e45115/src/include/stir/VectorWithOffset.inl#L611-L613

I put those in as a safety check really. std::vector::data() doesn't do that. It just says that the "behaviour is undefined" if you modify the underlying vector (e.g. by a resize(), which would affect read access as well). I felt it was safer to check, at the expense of more elaborate code (so maybe it wasn't a great decision).

Still, I don't understand why std::copy(beg, beg + stir_array.size_all(), iter); gives issues while std::copy(stir_array.begin_all(), stir_array.end_all(), iter); is fine...

I have no clue.

Note that this whole stuff is not thread-safe: if multiple processes modify an Array at the same time, stuff is going to happen. That' s not documented, but should be rather obvious. So, I don't think we do that.

I'm not so sure how to proceed. Can you find out where the NaNs are created? Is the sensitivity image ok for instance?

Can you please create an issue, just pointing to this discussion here? (I suggest we continue here for clarity)

markus-jehl commented 1 month ago

Yes, L60 is the problematic one. I'm currently running with as much debug output as possible to narrow down where it happens.

Sure, will create an issue.

markus-jehl commented 1 month ago

This is the subsensitivity image, so the NaN's come from there: image

KrisThielemans commented 1 month ago

ok. Narrowing it down. Relevant lines are https://github.com/UCL/STIR/blob/31e730eeb4dd5d2fb49c3d2f404ed7bc02e45115/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx#L830-L832 https://github.com/UCL/STIR/blob/31e730eeb4dd5d2fb49c3d2f404ed7bc02e45115/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx#L871

By the way, do your ctest and recon_test_pack work?

markus-jehl commented 1 month ago

All tests with ctest pass, but recon_test_pack I first need to look up how to run. Will also look into the distributable sensitivity computation, since that already sounds like it might be a good candidate :-)

KrisThielemans commented 1 month ago

All tests with ctest pass

interesting, as that is using OSMAPOSL etc as well

recon_test_pack, see e.g.: https://github.com/conda-forge/stir-feedstock/blob/6d4fd1aaf32bfe034bbe7440988cedac3f2d12da/recipe/meta.yaml#L66-L79

the distributable_computation is the hardest part in STIR (sits in distributable.cxx). Essentially, it is a multi-threaded loop where it's going to call a callback. That makes me think you could try and set threads=1, or compile without openmp.

thanks for debugging!

markus-jehl commented 1 month ago

Interestingly it looks exactly the same in repeat runs and also suing just 1 thread. So my suspicion is that the is_contiguous function sometimes gives false positives. But I'm still looking for the flaw in the logic...

KrisThielemans commented 1 month ago

it certainly is easier if it's an ordinary bug as opposed to a race-condition! I guess I'd recommend compiling without OpenMP then and running valgrind.

Other (more desperate) thing to try: use a more conservative approach for is_contiguous:

Vital to run ctest after that!

If that works, we could try to check consistency between the old and new is_contiguous (although the new one is overly pessimistic at the moment).

markus-jehl commented 1 month ago

I was actually thinking about this pessimistic approach! Because is_contiguous is called many times, I was worried if it might actually be computationally heavy in its own right. The member variable would solve this problem. Will give it a try.

I agree, running valgrind or a sanitiser should highlight where the issue is. For this I first need to create a minimal working example of the issue (currently I just test it in python with an iterative reconstruction that would take forever in serial with a sanitiser attached to it...). The recon_test_pack passes as well, btw.

KrisThielemans commented 1 month ago

ok. From Python, things are always trickier, as there's more variations possible. That'll be why I didn't see it yet.

As it's the sensitivity image, you can stop after the recon.set_up() call of course.

markus-jehl commented 1 month ago

Just an update: I've now ported the test across to C++ and I get this weird looking sensitivity map even when disabling the new copy. So the more I dig the less I understand... Will update you when I have more clarity.

KrisThielemans commented 1 month ago

Ok. Any chance you can do this on vanilla master as opposed to your branch?

markus-jehl commented 1 month ago

Yes, I'm currently doing it on the official master. It looks like in C++ some additional problems are caused by reading ProjDataInMemory from file and using them - as if some of that memory is either not persistent for long enough or gives issues with copying.

KrisThielemans commented 1 month ago

By the way, if you have a small test example, potentially we could add it as a test, as it's escaping all other tests somehow. Can also do a git bisect I guess. thanks, and sorry for the trouble.

markus-jehl commented 1 month ago

The address sanitiser actually already throws an error in the first line of my test, where I'm reading in ProjDataInMemory as such: auto inputProjData = std::make_shared<stir::ProjDataInMemory>(*(stir::ProjDataInMemory::read_from_file("filename.hs")));

This is the ASAN output:

=================================================================
==1174870==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6170000301d0 at pc 0x55eb98f589ad bp 0x7fff7c0a8730 sp 0x7fff7c0a7ef8
READ of size 92160 at 0x6170000301d0 thread T0
    #0 0x55eb98f589ac in __interceptor_memmove (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x1839ac) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #1 0x55eb9951fff8 in float* stir::CopyFill<stir::Array<3, float> >::copy_to<float*>(stir::Array<3, float> const&, float*) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x74aff8) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #2 0x55eb9951b93d in stir::ProjDataInMemory::set_segment(stir::SegmentBySinogram<float> const&) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x74693d) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #3 0x55eb9951b9d1 in stir::ProjDataInMemory::set_segment(stir::SegmentByView<float> const&) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x7469d1) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #4 0x55eb994e2093 in stir::ProjData::fill(stir::ProjData const&) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x70d093) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #5 0x55eb9951dfa3 in stir::ProjDataInMemory::ProjDataInMemory(stir::ProjData const&) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x748fa3) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #6 0x55eb991c7e94 in decltype(new ((void*)(0))stir::ProjDataInMemory(std::declval<stir::ProjData&>())) std::construct_at<stir::ProjDataInMemory, stir::ProjData&>(stir::ProjDataInMemory*, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:97:39
    #7 0x55eb991c7e94 in void std::allocator_traits<std::allocator<stir::ProjDataInMemory> >::construct<stir::ProjDataInMemory, stir::ProjData&>(std::allocator<stir::ProjDataInMemory>&, stir::ProjDataInMemory*, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:518:4
    #8 0x55eb991c7e94 in std::_Sp_counted_ptr_inplace<stir::ProjDataInMemory, std::allocator<stir::ProjDataInMemory>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<stir::ProjData&>(std::allocator<stir::ProjDataInMemory>, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:519:4
    #9 0x55eb991c7e94 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<stir::ProjDataInMemory, std::allocator<stir::ProjDataInMemory>, stir::ProjData&>(stir::ProjDataInMemory*&, std::_Sp_alloc_shared_tag<std::allocator<stir::ProjDataInMemory> >, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:651:6
    #10 0x55eb991c7e94 in std::__shared_ptr<stir::ProjDataInMemory, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<stir::ProjDataInMemory>, stir::ProjData&>(std::_Sp_alloc_shared_tag<std::allocator<stir::ProjDataInMemory> >, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1342:14
    #11 0x55eb991c7e94 in std::shared_ptr<stir::ProjDataInMemory>::shared_ptr<std::allocator<stir::ProjDataInMemory>, stir::ProjData&>(std::_Sp_alloc_shared_tag<std::allocator<stir::ProjDataInMemory> >, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:409:4
    #12 0x55eb991c7e94 in std::shared_ptr<stir::ProjDataInMemory> std::allocate_shared<stir::ProjDataInMemory, std::allocator<stir::ProjDataInMemory>, stir::ProjData&>(std::allocator<stir::ProjDataInMemory> const&, stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:862:14
    #13 0x55eb991c7e94 in std::shared_ptr<stir::ProjDataInMemory> std::make_shared<stir::ProjDataInMemory, stir::ProjData&>(stir::ProjData&) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:878:14
    #14 0x55eb991c7e94 in TestClientReconComponent::TestArrayBug() /workspace/neurolf/reconstruction/test/unit/ReconstructionComponentTests.cpp:492:24
    #15 0x55eb9920907e in ____C_A_T_C_H____T_E_S_T____18() /workspace/neurolf/reconstruction/test/unit/ReconstructionComponentTests.cpp:987:24
    #16 0x55eb9902b759 in Catch::TestInvokerAsFunction::invoke() const /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:14317:9
    #17 0x55eb99096d36 in Catch::TestCase::invoke() const /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:14156:15
    #18 0x55eb99096d36 in Catch::RunContext::invokeActiveTestCase() /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:13016:27
    #19 0x55eb990943a9 in Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:12989:17
    #20 0x55eb9909129f in Catch::RunContext::runTest(Catch::TestCase const&) /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:12750:13
    #21 0x55eb9909f32f in Catch::(anonymous namespace)::TestGroup::execute() /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:13343:45
    #22 0x55eb9909f32f in Catch::Session::runInternal() /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:13549:39
    #23 0x55eb9909c4eb in Catch::Session::run() /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:13505:24
    #24 0x55eb990d56d2 in int Catch::Session::run<char>(int, char const* const*) /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:13227:30
    #25 0x55eb990d56d2 in main /workspace/neurolf/reconstruction/libs/install/include/catch2/catch.hpp:17504:29
    #26 0x7f7c09d04d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #27 0x7f7c09d04e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #28 0x55eb98f3eb64 in _start (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x169b64) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)

0x6170000301d0 is located 0 bytes to the right of 720-byte region [0x61700002ff00,0x6170000301d0)
allocated by thread T0 here:
    #0 0x55eb98ffc88d in operator new[](unsigned long) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x22788d) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)
    #1 0x55eb99212e3a in stir::VectorWithOffset<float>::reserve(int, int) /workspace/neurolf/reconstruction/libs/install/include/STIR-6.2/stir/VectorWithOffset.inl:395:45
    #2 0x55eb99337963 in stir::detail::VectorWithOffset_iter<stir::Array<1, float> > std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<stir::detail::VectorWithOffset_iter<stir::Array<1, float> const>, stir::detail::VectorWithOffset_iter<stir::Array<1, float> > >(stir::detail::VectorWithOffset_iter<stir::Array<1, float> const>, stir::detail::VectorWithOffset_iter<stir::Array<1, float> const>, stir::detail::VectorWithOffset_iter<stir::Array<1, float> >) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x562963) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x1839ac) (BuildId: 8775ca2bc7777fd8de9cf6411d87eded039071d8) in __interceptor_memmove
Shadow bytes around the buggy address:
  0x0c2e7fffdfe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffdff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c2e7fffe030: 00 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa
  0x0c2e7fffe040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e7fffe050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e7fffe080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1174870==ABORTING

When I replace the first if statement with if (false), then it runs through the whole test and gives a good sensitivity map, but in the very end also reports a small memory leak:

=================================================================
==1176468==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1584 byte(s) in 3 object(s) allocated from:
    #0 0x562c30a4e77d in operator new(unsigned long) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x22777d) (BuildId: 5778f5e9d8b110bd49f3ae4e3fea7131c16efff7)
    #1 0x562c30f364c9 in stir::ProjData::read_from_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_Ios_Openmode) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x70f4c9) (BuildId: 5778f5e9d8b110bd49f3ae4e3fea7131c16efff7)

Indirect leak of 24576 byte(s) in 3 object(s) allocated from:
    #0 0x562c30a4e88d in operator new[](unsigned long) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x22788d) (BuildId: 5778f5e9d8b110bd49f3ae4e3fea7131c16efff7)
    #1 0x7f91d9670023 in std::basic_filebuf<char, std::char_traits<char> >::_M_allocate_internal_buffer() (/lib/x86_64-linux-gnu/libstdc++.so.6+0x115023) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)

SUMMARY: AddressSanitizer: 26160 byte(s) leaked in 6 allocation(s).
markus-jehl commented 1 month ago

Ok, progress! Managed to debug into STIR to see where it's failing. All that is needed is to handle the case where an array has only one index. Currently it is not entering the for loop and therefore not checking whether the sub-array is contiguous (plus the last sub-array was never checked). Changing the for loop from i < this->get_max_index() to i <= this->get_max_index() and handling the one case where we index i+1 fixes this:

template <int num_dimensions, typename elemT>
bool
Array<num_dimensions, elemT>::is_contiguous() const
{
  auto mem = &(*this->begin_all());
  for (auto i = this->get_min_index(); i <= this->get_max_index(); ++i)
    {
      if (!(*this)[i].is_contiguous())
        return false;
      mem += (*this)[i].size_all();
      if (i < this->get_max_index() && mem != &(*(*this)[i + 1].begin_all()))
        return false;
    }
  return true;
}
KrisThielemans commented 1 month ago

great! That was a bug. Can you submit a PR?

KrisThielemans commented 1 month ago

I suggest

     if (i == this->get_max_index() 
       return true;
     mem += (*this)[i].size_all();
     if (mem != &(*(*this)[i + 1].begin_all()))
        return false;

and remove the final return false. Might be slightly easier to understand the logic.

KrisThielemans commented 1 month ago

By the way, while

auto inputProjData = std::make_shared<stir::ProjDataInMemory>(*(stir::ProjDataInMemory::read_from_file("filename.hs")));

is fine, it might be clearer to use

std::shared<stir::ProjDataInMemory> inputProjData(stir::ProjDataInMemory::read_from_file("filename.hs"));
markus-jehl commented 1 month ago

Yes, I'll issue a PR. But first I also want to understand the memory leak that still exists.

KrisThielemans commented 1 month ago

Ok. Doesn't TSAN tell you where it came from? Valgrind could.

markus-jehl commented 1 month ago

This is all it says:

==1342194==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1584 byte(s) in 3 object(s) allocated from:
    #0 0x55da4942f77d in operator new(unsigned long) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x22777d) (BuildId: ee84e9178c2086875508efe40ec877644497b58a)
    #1 0x55da49917699 in stir::ProjData::read_from_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_Ios_Openmode) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x70f699) (BuildId: ee84e9178c2086875508efe40ec877644497b58a)

Indirect leak of 24576 byte(s) in 3 object(s) allocated from:
    #0 0x55da4942f88d in operator new[](unsigned long) (/workspace/neurolf/reconstruction/build/test/nlf.reconstruction.tests.asan+0x22788d) (BuildId: ee84e9178c2086875508efe40ec877644497b58a)
    #1 0x7f8958273023 in std::basic_filebuf<char, std::char_traits<char> >::_M_allocate_internal_buffer() (/lib/x86_64-linux-gnu/libstdc++.so.6+0x115023) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)

SUMMARY: AddressSanitizer: 26160 byte(s) leaked in 6 allocation(s).

I suspect it is when the object goes out of scope at the end of the test that the memory is not correctly freed. Has something changed there recently?

KrisThielemans commented 1 month ago

Seems unrelated to the rest. Nothing changed there.

It's not so useful that it doesn't say what object. Clearly, it cannot be the ProjDataFromMemory data itself, as the leak would then be a lot larger.

Just for sanity, please try the above constructor instead.

markus-jehl commented 1 month ago

I actually did (must have also tried before and then given up), but it gives me the following error:

/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:150:7: note: candidate constructor not viable: no known conversion from 'shared_ptr<stir::ProjData>' to 'const shared_ptr<stir::ProjDataInMemory>' for 1st argument
      shared_ptr(const shared_ptr&) noexcept = default; ///< Copy constructor
      ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:304:7: note: candidate constructor not viable: no known conversion from 'shared_ptr<stir::ProjData>' to 'shared_ptr<stir::ProjDataInMemory>' for 1st argument
      shared_ptr(shared_ptr&& __r) noexcept
      ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:357:17: note: candidate constructor not viable: no known conversion from 'shared_ptr<stir::ProjData>' to 'std::nullptr_t' for 1st argument
      constexpr shared_ptr(nullptr_t) noexcept : shared_ptr() { }
                ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:160:2: note: candidate template ignored: could not match '_Yp *' against 'shared_ptr<stir::ProjData>'
        shared_ptr(_Yp* __p) : __shared_ptr<_Tp>(__p) { }
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:296:2: note: candidate template ignored: requirement 'is_constructible<std::__shared_ptr<stir::ProjDataInMemory, __gnu_cxx::_S_atomic>, const std::shared_ptr<stir::ProjData> &>::value' was not satisfied [with _Yp = stir::ProjData]
        shared_ptr(const shared_ptr<_Yp>& __r) noexcept
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:313:2: note: candidate template ignored: requirement 'is_constructible<std::__shared_ptr<stir::ProjDataInMemory, __gnu_cxx::_S_atomic>, std::shared_ptr<stir::ProjData>>::value' was not satisfied [with _Yp = stir::ProjData]
        shared_ptr(shared_ptr<_Yp>&& __r) noexcept
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:325:11: note: candidate template ignored: could not match 'weak_ptr' against 'shared_ptr'
        explicit shared_ptr(const weak_ptr<_Yp>& __r)
                 ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:332:2: note: candidate template ignored: could not match 'auto_ptr' against 'shared_ptr'
        shared_ptr(auto_ptr<_Yp>&& __r);
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:340:2: note: candidate template ignored: could not match 'unique_ptr' against 'shared_ptr'
        shared_ptr(unique_ptr<_Yp, _Del>&& __r)
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:408:2: note: candidate template ignored: could not match '_Sp_alloc_shared_tag' against 'shared_ptr'
        shared_ptr(_Sp_alloc_shared_tag<_Alloc> __tag, _Args&&... __args)
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:148:17: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
      constexpr shared_ptr() noexcept : __shared_ptr<_Tp>() { }
                ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:177:2: note: candidate constructor template not viable: requires 2 arguments, but 1 was provided
        shared_ptr(_Yp* __p, _Deleter __d)
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:194:2: note: candidate constructor template not viable: requires 2 arguments, but 1 was provided
        shared_ptr(nullptr_t __p, _Deleter __d)
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:257:2: note: candidate constructor template not viable: requires 2 arguments, but 1 was provided
        shared_ptr(const shared_ptr<_Yp>& __r, element_type* __p) noexcept
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:284:2: note: candidate constructor template not viable: requires 2 arguments, but 1 was provided
        shared_ptr(shared_ptr<_Yp>&& __r, element_type* __p) noexcept
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:417:7: note: candidate constructor not viable: requires 2 arguments, but 1 was provided
      shared_ptr(const weak_ptr<_Tp>& __r, std::nothrow_t) noexcept
      ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:214:2: note: candidate constructor template not viable: requires 3 arguments, but 1 was provided
        shared_ptr(_Yp* __p, _Deleter __d, _Alloc __a)
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:233:2: note: candidate constructor template not viable: requires 3 arguments, but 1 was provided
        shared_ptr(nullptr_t __p, _Deleter __d, _Alloc __a)

It's as if it can't convert pointers to convertable objects, only the objects themselves. With my old approach I initialise a new pointer with the converted object, so it worked.

My suspicion is that the leak might be in the interfile parsing somewhere, but I haven't yet found an obvious allocation somewhere that might cause this.

markus-jehl commented 1 month ago

Update: When I use std::shared_ptr<stir::ProjDataInMemory> inputProjData(std::dynamic_pointer_cast<stir::ProjDataInMemory>(stir::ProjDataInMemory::read_from_file("filename.hs"))); then it compiles, but gives a segmentatoin fault later when I call sapyb on inputProjData to subtract randoms from it. Is this because the underlying object is still some file stream that is closed prematurely?

KrisThielemans commented 1 month ago

It's as if it can't convert pointers to convertable objects, only the objects themselves. With my old approach I initialise a new pointer with the converted object, so it worked.

oops, sorry. it needs a std::move

std::shared_ptr<stir::ProjDataInMemory> inputProjData(std::move(stir::ProjDataInMemory::read_from_file("filename.hs"));

The cast will fail, as the object disappears. Not "prematurely" in a C++ sense, as the unique_ptr gets destructed immediately, so the object is gone.

KrisThielemans commented 1 month ago

My suspicion is that the leak might be in the interfile parsing somewhere, but I haven't yet found an obvious allocation somewhere that might cause this.

oh right. There might even be an issue on that. I'll keep ignoring that for a bit longer. sorry...

markus-jehl commented 1 month ago

It's as if it can't convert pointers to convertable objects, only the objects themselves. With my old approach I initialise a new pointer with the converted object, so it worked.

oops, sorry. it needs a std::move

std::shared_ptr<stir::ProjDataInMemory> inputProjData(std::move(stir::ProjDataInMemory::read_from_file("filename.hs"));

The cast will fail, as the object disappears. Not "prematurely" in a C++ sense, as the unique_ptr gets destructed immediately, so the object is gone.

Hmm... The move still gives me the same error: note: candidate constructor not viable: no known conversion from 'shared_ptr<stir::ProjData>' to 'const shared_ptr<stir::ProjDataInMemory>' for 1st argument

KrisThielemans commented 1 month ago

ok. Of course, read_from_file doesn't return a ProjDataInMemory object. ignore me!

markus-jehl commented 1 month ago

My suspicion is that the leak might be in the interfile parsing somewhere, but I haven't yet found an obvious allocation somewhere that might cause this.

oh right. There might even be an issue on that. I'll keep ignoring that for a bit longer. sorry...

I couldn't find an issue related to this. Should I create one?

KrisThielemans commented 1 month ago

Neither can I. Please do so. Thanks a lot!