Closed KrisThielemans closed 3 months ago
Some timings on a beefy machine (30 threads) with a large TOF file (GE Signa PET/MR, view mashing 3, 3 segments, 33 TOF bins, total elems: 676,889,136).
old timings (sorry, not all of them):
copy_image 23.500 24.319
create_copy_proj_data_mem_to_mem 770.000 2224.292
create_copy_proj_data_mem_to_file 44740.000 9448.194
create_copy_proj_data_file_to_mem 41920.000 6138.730
create_copy_proj_data_file_to_file 39345.000 7681.845
new timings:
copy_image 25.000 25.709
copy_add_image 38.000 37.767
copy_mult_image 37.500 37.745
copy_std_vector_of_size_projdata 200.000 197.675
create_vector_of_size_projdata 110.000 762.857
create_proj_data_in_mem_no_init 350.000 1778.367
create_proj_data_in_mem_init 375.000 1766.101
copy_proj_data_mem_to_mem 225.000 224.398
create_copy_proj_data_mem_to_mem 580.000 1985.701
create_copy_proj_data_mem_to_file 2835.000 8502.535
create_copy_proj_data_file_to_mem 3320.000 5450.137
create_copy_proj_data_file_to_file 1020.000 7258.220
copy_add_proj_data_mem 950.000 2394.447
copy_mult_proj_data_mem 995.000 2389.107
Sadly, PMRT projection times are virtually identical (no CUDA on that machine).
@markus-jehl one more for you to check... Obviously building on master, so presumably will have the same problem as #1438
MacOS job fails 1 ctest with just
test fill(ProjDataInterfile)
Error. fill(ProjDataInterfile&)
MacOS job worked ok this time around, even though I didn't change the code (just the formatting of the test), so presumably this was just a borderline rounding case.
@markus-jehl I have merged your fix on here. I'm more optimistic about this one making some positive impact, even if it's only be removing code complexity 😄 (and one more dependency on boost).
@markus-jehl I have merged your fix on here. I'm more optimistic about this one making some positive impact, even if it's only be removing code complexity 😄 (and one more dependency on boost).
Perfect, I'll give this a try!
Codacy failure is because tmp
below is "unused"
void create_std_vector()
{
std::vector<float> tmp(this->v1.size());
}
again, no noticeable difference :-(
ok. I guess this will only show up for LAFOV and/or TOF scanners. Anyway, the code is cleaner. Thanks for checking!
boost::iterator_adaptor
code. Looks like we don't need it at all.VectorWithOffset<T>::iterator
is now aT*
. This allows compilers to speed-upstd::copy
et al. automatically (by usingmemmove
).stir_timings
to be able to check this.Before the change,
ProjDataInMemory::fill
was about 1.5 slower thanstd::copy
withstd::vector
, while now it is as fast. In addition,ProjDataFromStream::fill
now takes far less CPU time (although difference in wall-clock time on my test machine is very small, probably as that is dominated by IO). This is likely because theget_empty_segment_by_view()/segment.fill()/set_segment()
code is now faster.Fixes #1259