Remove boost::iterator_adaptor from VectorWithOffset

KrisThielemans commented 4 months ago

Remove the somewhat complicated boost::iterator_adaptor code. Looks like we don't need it at all. VectorWithOffset<T>::iterator is now a T*. This allows compilers to speed-up std::copy et al. automatically (by using memmove).
add extra timings to stir_timings to be able to check this.

Before the change, ProjDataInMemory::fill was about 1.5 slower than std::copy with std::vector, while now it is as fast. In addition, ProjDataFromStream::fill now takes far less CPU time (although difference in wall-clock time on my test machine is very small, probably as that is dominated by IO). This is likely because the get_empty_segment_by_view()/segment.fill()/set_segment() code is now faster.

Fixes #1259

KrisThielemans commented 4 months ago

Some timings on a beefy machine (30 threads) with a large TOF file (GE Signa PET/MR, view mashing 3, 3 segments, 33 TOF bins, total elems: 676,889,136).

old timings (sorry, not all of them):

        copy_image                                                23.500                          24.319
        create_copy_proj_data_mem_to_mem                         770.000                        2224.292
        create_copy_proj_data_mem_to_file                      44740.000                        9448.194
        create_copy_proj_data_file_to_mem                      41920.000                        6138.730
        create_copy_proj_data_file_to_file                     39345.000                        7681.845

new timings:

        copy_image                                                25.000                          25.709
        copy_add_image                                            38.000                          37.767
        copy_mult_image                                           37.500                          37.745
        copy_std_vector_of_size_projdata                         200.000                         197.675
        create_vector_of_size_projdata                           110.000                         762.857
        create_proj_data_in_mem_no_init                          350.000                        1778.367
        create_proj_data_in_mem_init                             375.000                        1766.101
        copy_proj_data_mem_to_mem                                225.000                         224.398
        create_copy_proj_data_mem_to_mem                         580.000                        1985.701
        create_copy_proj_data_mem_to_file                       2835.000                        8502.535
        create_copy_proj_data_file_to_mem                       3320.000                        5450.137
        create_copy_proj_data_file_to_file                      1020.000                        7258.220
        copy_add_proj_data_mem                                   950.000                        2394.447
        copy_mult_proj_data_mem                                  995.000                        2389.107

Sadly, PMRT projection times are virtually identical (no CUDA on that machine).

KrisThielemans commented 4 months ago

@markus-jehl one more for you to check... Obviously building on master, so presumably will have the same problem as #1438

KrisThielemans commented 4 months ago

MacOS job fails 1 ctest with just

test fill(ProjDataInterfile)
Error. fill(ProjDataInterfile&)

KrisThielemans commented 4 months ago

MacOS job worked ok this time around, even though I didn't change the code (just the formatting of the test), so presumably this was just a borderline rounding case.

KrisThielemans commented 4 months ago

@markus-jehl I have merged your fix on here. I'm more optimistic about this one making some positive impact, even if it's only be removing code complexity 😄 (and one more dependency on boost).

markus-jehl commented 4 months ago

@markus-jehl I have merged your fix on here. I'm more optimistic about this one making some positive impact, even if it's only be removing code complexity 😄 (and one more dependency on boost).

Perfect, I'll give this a try!

KrisThielemans commented 3 months ago

Codacy failure is because tmp below is "unused"

void create_std_vector()
  {
    std::vector<float> tmp(this->v1.size());
  }

markus-jehl commented 3 months ago

again, no noticeable difference :-(

KrisThielemans commented 3 months ago

ok. I guess this will only show up for LAFOV and/or TOF scanners. Anyway, the code is cleaner. Thanks for checking!

UCL / STIR

Remove boost::iterator_adaptor from VectorWithOffset #1442