CARTAvis / carta-backend

Source code repository for the backend component of CARTA, a new visualization tool designed for the ALMA, the VLA and the SKA pathfinders.
https://cartavis.github.io/
GNU General Public License v3.0
22 stars 11 forks source link

Improve region spatial profile and PV performance #1353

Closed pford closed 7 months ago

pford commented 10 months ago

Description

Checklist

kswang1029 commented 10 months ago

@pford I did a quick (somewhat extreme) test and see a significant improvement on performance in spatial profile generation (eg, v4.1 17s-ish vs thisPR 4s-ish) πŸš€πŸš€πŸš€πŸš€ I will continue with a more detailed test after the region refactoring PR is in.

The e2e test looks good except a pv cancellation test (due to performance improvement which is good).

kswang1029 commented 10 months ago

@pford just to double check: does the code change in this PR include the changes in https://github.com/CARTAvis/carta-backend/pull/1352 as well?

pford commented 10 months ago

@pford just to double check: does the code change in this PR include the changes in #1352 as well?

@kswang1029 I thought the Region refactor should be a separate branch and PR because of the large code change. I combined the branches for both PRs into pam/combine_1347_1339_branches if you want to check it out.

veggiesaurus commented 10 months ago

@pford I did a quick (somewhat extreme) test and see a significant improvement on performance in spatial profile generation (eg, v4.1 17s-ish vs thisPR 4s-ish) πŸš€πŸš€πŸš€πŸš€ I will continue with a more detailed test after the region refactoring PR is in.

The e2e test looks good except a pv cancellation test (due to performance improvement which is good).

@kswang1029 can we benchmark these improvements against any other tools? πŸ€”

kswang1029 commented 9 months ago

@pford I found that the pv generation progress bar displayed at frontend is not that accurate compared to dev. Now the progress bar goes quickly close to the right end and then hangs there until the process is done. Is that because now the progressive update logic has been changed?

kswang1029 commented 9 months ago

@pford I did a quick (somewhat extreme) test and see a significant improvement on performance in spatial profile generation (eg, v4.1 17s-ish vs thisPR 4s-ish) πŸš€πŸš€πŸš€πŸš€ I will continue with a more detailed test after the region refactoring PR is in. The e2e test looks good except a pv cancellation test (due to performance improvement which is good).

@kswang1029 can we benchmark these improvements against any other tools? πŸ€”

Yes, I tried psrecord for spatial profile. Roughly it is 4x improvement if we request the profile from the reference image (ie no spatial matching in action). @pford are the CPU footprints expected in the new approach?

dev: 18.83 s dev_spatial_profile

pr: 4.81s pr_spatial_profile

However, if we request a spatial profile from the spatially-matched image, the improvement is much less. (dev: 124.6s vs pr: 110.6s). @pford is this also expected? Any chance to improve this as well?

pford commented 9 months ago

Fixed the PV progress bar. Also updated the combined branch pam/combine_1347_1339_branches with this change and the changes in the 1347 branch.

Spatial profiles for matched regions take each box region along the line in parallel and convert pixel control points -> world points -> WCRegion -> matched LCRegion then apply the box region to the matched image (= SubImage, with an exclusive lock due to casacore) for region data. A region in the reference image takes each box in parallel to create an LCRegion from pixel control points and apply it to the image cache stored in memory (with a read lock) for region data, which is faster. But I will look for improvements for matched regions and investigate the psrecord CPU spike.

kswang1029 commented 9 months ago

Fixed the PV progress bar. Also updated the combined branch pam/combine_1347_1339_branches with this change and the changes in the 1347 branch.

Confirmed that the pv generation progress bar now behaves normally and as expected. πŸ‘

pford commented 9 months ago

@kswang1029 it was difficult to reproduce the cpu spike mid-profile, but I suspect it was a cursor update. I moved my mouse in the image while it was calculating the profile and I finally did see a spike like yours.

kswang1029 commented 9 months ago

@kswang1029 it was difficult to reproduce the cpu spike mid-profile, but I suspect it was a cursor update. I moved my mouse in the image while it was calculating the profile and I finally did see a spike like yours.

I see. Thanks for investigating the spikes. As for the performance of the matched images, if you think there is not much we can do about it due to mutex, then I will approve the PR.

pford commented 9 months ago

@kswang1029 I found two issues with the matched region analytics implementation, when the region is a matched rotbox (rotated rectangle region for stats/spectral profiles, or box region along a line for spatial profiles/PV image):

  1. When the converted region is not distorted, the rotbox was still being approximated as a polygon with (constant) 1000 points. I changed this to convert the 4 corners of the box to the matched image and create a casacore::LCPolygon from the converted points. Much faster.
  2. When the converted region is distorted, and the rotbox is approximated as a polygon with (constant) 1000 points, if the region is very small (by definition each box along the line is small 3 x line_width pixels, default 3 max 20) each segment length between points can be as low as 0.01 pixels. This seemed like overkill to me, so I set a minimum length of 0.1 pixel so there are fewer polygon points to convert to the matched image. (Perhaps still too low? Any suggestions for the appropriate interval?)
kswang1029 commented 9 months ago

@kswang1029 I found two issues with the matched region analytics implementation, when the region is a matched rotbox (rotated rectangle region for stats/spectral profiles, or box region along a line for spatial profiles/PV image):

  1. When the converted region is not distorted, the rotbox was still being approximated as a polygon with (constant) 1000 points. I changed this to convert the 4 corners of the box to the matched image and create a casacore::LCPolygon from the converted points. Much faster.

  2. When the converted region is distorted, and the rotbox is approximated as a polygon with (constant) 1000 points, if the region is very small (by definition each box along the line is small 3 x line_width pixels, default 3 max 20) each segment length between points can be as low as 0.01 pixels. This seemed like overkill to me, so I set a minimum length of 0.1 pixel so there are fewer polygon points to convert to the matched image. (Perhaps still too low? Any suggestions for the appropriate interval?)

1/10th pixel size as interval sounds sensible. πŸ‘ I will retest the new implementation.

kswang1029 commented 8 months ago

@pford ok I have re-tested the latest commit 8d7bb1d with a flat-field image (1920x1920) and a wide-field image (16384x16384). flat S255_IR_sci spw29 cube I pbcor fits-image-2024-03-05-14-47-15 wide cluster_16384 fits-image-2024-03-05-14-47-39

I made two copies of each and loaded them with one as the reference image and the other as the matched image. The performance is indeed improved SIGNIFICANTLY. Screenshot 2024-03-05 at 14 39 13 One thing I spotted is for the flat case, with this PR, on matched image the performance is better than than on the reference image? Is this expected?

I also checked the profile values between the reference image and the matched image using this PR. In the 2nd and 3rd tab of https://docs.google.com/spreadsheets/d/1JmHmKlDOXWoTDSnW8NAbyOwUbYtwjmCyUDPzt3vG80c/edit?usp=sharing I computed the relative errors using the reference image as the reference. For the wide field case, we see in general the relative error is less than 5%. We do see errors ~ 10% or more. I suspect these errors happened at the kinks of the polyline region. Not sure if applying 1/20th or 1/100th sampling could improve this error? On the flat field image, the error is zero as it should be.

pford commented 8 months ago

@kswang1029 I improved the rotbox handling for the reference image (create the LCPolygon directly from the box corners computed from the control points and rotation), so its performance is better than for the matched image as expected. However, I cannot reproduce the errors in the matched widefield spatial profiles you described (after my code change...). Could you try again, and if you still see the large errors, can you post the widefield image and the region you used in the test? Thanks!

kswang1029 commented 8 months ago

@kswang1029 I improved the rotbox handling for the reference image (create the LCPolygon directly from the box corners computed from the control points and rotation), so its performance is better than for the matched image as expected. However, I cannot reproduce the errors in the matched widefield spatial profiles you described (after my code change...). Could you try again, and if you still see the large errors, can you post the widefield image and the region you used in the test? Thanks!

Will re-test the new commitπŸ‘

kswang1029 commented 8 months ago

@pford the e2e tests detected a backend crash when requesting pv preview. The macOS crash log is attached.

Thread 2 Crashed: 0 carta_backend 0x10468e74c casacore::ArrayLattice::getAt(casacore::IPosition const&) const + 16 (ArrayLattice.tcc:167) 1 carta_backend 0x10476e540 carta::PvPreviewCube::GetRegionProfile(casacore::Slicer const&, casacore::ArrayLattice const&, std::1::function<void (float)>, std::1::vector<float, std::1::allocator>&, double&, std::1::basic_string<char, std::1::char_traits, std::__1::allocator>&) + 468 (PvPreviewCube.cc:177) 2 carta_backend 0x1047e31c8 carta::RegionHandler::CalculatePvPreviewImage(int, int, bool, std::1::shared_ptr, std::1::shared_ptr, std::__1::function<void (float)>, CARTA::PvResponse&, carta::GeneratedImage&) + 2204 (RegionHandler.cc:1281) 3 carta_backend 0x1047e0c4c carta::RegionHandler::CalculatePvPreviewImage(int, int, int, AxisRange&, bool, std::1::shared_ptr&, CARTA::PvPreviewSettings const&, std::1::function<void (float)>, CARTA::PvResponse&, carta::GeneratedImage&) + 3028 (RegionHandler.cc:1179) 4 carta_backend 0x1047dfee4 carta::RegionHandler::CalculatePvImage(CARTA::PvRequest const&, std::__1::shared_ptr&, std::1::function<void (float)>, CARTA::PvResponse&, carta::GeneratedImage&) + 1324 (RegionHandler.cc:1019) 5 carta_backend 0x104816b60 carta::Session::OnPvRequest(CARTA::PvRequest const&, unsigned int) + 972 (Session.cc:1356) 6 carta_backend 0x10482c38c carta::GeneralMessageTask::execute() + 28 (OnMessageTask.tcc:34) 7 carta_backend 0x1048595d4 carta::ThreadManager::StartEventHandlingThreads(int)::$_0::operator()() const + 96 (ThreadingManager.cc:54) [inlined] 8 carta_backend 0x1048595d4 decltype(static_cast<carta::ThreadManager::StartEventHandlingThreads(int)::$_0>(fp)()) std::1::invoke<carta::ThreadManager::StartEventHandlingThreads(int)::$_0>(carta::ThreadManager::StartEventHandlingThreads(int)::$_0&&) + 96 (type_traits:3918) [inlined] 9 carta_backend 0x1048595d4 void std::1::thread_execute<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete>, carta::ThreadManager::StartEventHandlingThreads(int)::$_0>(std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete>, carta::ThreadManager::StartEventHandlingThreads(int)::$_0>&, std::1::tuple_indices<>) + 96 (thread:287) [inlined] 10 carta_backend 0x1048595d4 void* std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete>, carta::ThreadManager::StartEventHandlingThreads(int)::$_0>>(void*) + 192 (thread:298) 11 libsystem_pthread.dylib 0x19481bfa8 _pthread_start + 148 12 libsystem_pthread.dylib 0x194816da0 thread_start + 8

pford commented 8 months ago

@kswang1029 I fixed the bugs introduced by the faster rotbox implementation. Thanks for catching this!

Side note: these changes will be removed in the Region -> RegionConverter implementation (where LCRegion for the reference image is from casacore::Record for all regions, rather than directly from the control points and rotation for rotbox only). Should we combine these PRs, since the same tests (verification and performance) need to be run for both? With the original PRs, the changes in this PR were outside the Region and the changes in PR #1352 were mainly inside the Region, moving its code to some new files and fixing includes, so there was basically no overlap. However, the latest changes here are in the Region class.

kswang1029 commented 8 months ago

@pford with the latest commit (ee4ce1b) the pv preview crashing issue is fixed. πŸ‘

I tried again to see the difference of the derived spatial profiles from the reference image and the matched image. Still I see errors in the case of matched wide field image. I compared the values in the x axis and in the y axis, respectively. The x axis is identical in both flat and wide cases. The y axis is identical in the flat case. Only in the wide case we see errors in y axis.

Figure_1

I attached the test images and the region file here: https://drive.google.com/file/d/1MvKlVjysO4_iUCrPanv-Z86DnfUzp7Ce/view?usp=sharing

I think this is something to do with the polygon approximation in the wide-field (with projection distortion) case. Are we using 1/10th sampling approach now? Maybe using a finer sampling factor can improve the error with cost of performance. Could you point me where I can adjust that factor so that I can perform experiments here directly? I would like to see if we can have errors less than 5% (there is no right answer on this though).

pford commented 8 months ago

@kswang1029 I do not see a correlation between errors and the pivot points of the polyline. The polyline segments correspond to these profile indexes:

kswang1029 commented 8 months ago

@kswang1029 I do not see a correlation between errors and the pivot points of the polyline. The polyline segments correspond to these profile indexes:

  • segment 0-1 = profile 0 - 8702
  • segment 1-2 = profile 8703 - 18915
  • segment 2-3 = profile 18916 - 27377
  • segment 3-4 = profile 27378 - 36486

With a horizontal line, with 0.1 sampling factor, I see a perfect fit of the two profiles. However, once a line segment has an angle, we start seeing errors. I also tried 0.001, and 0.00001 sample factors but still errors persist and do not become smaller.

kswang1029 commented 8 months ago

@pford I also see similar in v4.1-stable. So if you like, we can file a separate issue for this error issue?

pford commented 8 months ago

@kswang1029 the issue was in the polygon approximation, which removes points on a nearly horizontal line but favors those near an integral pixel. After fixing this (ignoring nearest pixel), the maximum error I got was ~3.35% with the minimum length 0.1. Without minimum length, the error goes to 0 but time increases 6-7x for the matched region profile (5.1s to 34.1s for me). Also was ~100ms faster in both cases (reference and matched) without the parallel loop for the profiles, so I took that out.

kswang1029 commented 8 months ago

@kswang1029 the issue was in the polygon approximation, which removes points on a nearly horizontal line but favors those near an integral pixel. After fixing this (ignoring nearest pixel), the maximum error I got was ~3.35% with the minimum length 0.1. Without minimum length, the error goes to 0 but time increases 6-7x for the matched region profile (5.1s to 34.1s for me). Also was ~100ms faster in both cases (reference and matched) without the parallel loop for the profiles, so I took that out.

@pford Nice! With the minimum length 0.1, having an error of a few percent should be fine. We may consider to have a configurable parameter for users if higher precision (with cost of performance) is needed. But this is out of the current scope. πŸ‘

github-actions[bot] commented 7 months ago

Code Coverage

Package Line Rate Health
src.Cache 72% βž–
src.DataStream 44% βž–
src.FileList 67% βž–
src.Frame 36% ❌
src.HttpServer 42% βž–
src.ImageData 28% ❌
src.ImageFitter 83% βœ”
src.ImageGenerators 43% βž–
src.ImageStats 75% βœ”
src.Logger 37% ❌
src.Main 52% βž–
src.Region 69% βž–
src.Session 4% ❌
src.Table 52% βž–
src.ThreadingManager 67% βž–
src.Timer 85% βœ”
src.Util 40% βž–
Summary 46% (8605 / 18779) βž–