Closed nabobalis closed 1 year ago
In GitLab by @markcheung on Sep 1, 2020, 11:04
Thanks @cbard, Jack and Raphael. This is really great.
In GitLab by @cbard on Sep 1, 2020, 12:53
A quick comparison on my laptop using %timeit
gives:
type | time |
---|---|
original 3rd/1st | 12/11 s |
new disk check (3rd/1st) | 2.8/2.03s |
openCV (3rd/1st) | 1.53/1.53s |
The first two rows are using the skimage.warp
, not the scipy.ndimage.affine_transform
(which adds another second to runtime).
Raphael, on his desktop, did compare 1st vs. 3rd order interpolations with openCV. He got similar results for both: 625-650 ms (compare with 5 seconds for the original 3rd order register).
So, yes, I think it is a comparable speedup. OpenCV does look like the fastest implementation of affine_transform (in any order), but getting the full benefits will require changing how register
deals with map metadata and uses the native sunpy
functions.
In GitLab by @wtbarnes on Sep 3, 2020, 08:00
I think it would be interesting to first see whether the OpenCV affine transform can be used as a "backend" (in the same way we use scikit-image or scipy) for the affine transform in sunpy
. I would really not like this package to be in the business of defining its own image manipulation routines unless the problem is very specific to AIA images (which an affine warping is not).
The way I see it, this package should view Map
as a fundamental data type. Functions should ingest Map
s and return Map
s and only operate on images outside of the Map
container unless absolutely necessary. If we are spending a lot of time doing some generic (not specific to AIA) operation on the bare array, I think that in and of itself is a good indication that that functionality should be offloaded to a lower level package.
I say all of this not to discourage efforts to optimize this process, but just to make sure what we're doing is in the scope of this package.
In GitLab by @markcheung on Sep 3, 2020, 08:48
I parse Will’s comments as encouragement for the Sunpy team to optimize some routines commonly used in workflows. I guess what Jack did is a start.
I have used astropy.io.fits to load AIA images too. It probably is worth being an example in the aiapy gallery. Now, if registering can be 5 or 10 times faster without Sunpy Map, we should offer it as an alternate workflow.
I think two principles we can keep in mind if we were to offer an alternate workflow is:
In GitLab by @cbard on Sep 3, 2020, 10:01
I agree with what you both have said. Some points of clarification, and details of specifics:
I'm not sure if we've shown that register
is 5-10x faster without sunpy.Map
since all of the updated register functions technically use sunpy.Map
. Rather, we've shown that register
can be 5-10x faster if 1) we streamline the overhead in register
and sunpy.map.rotate
, and 2) we add the OpenCV version of warpAffine
upstream in sunpy.map.rotate
.
It just turns out that, since the CuPy and OpenCV transforms promise execution times of ~100-200 ms (versus 1-3 seconds for skimage
/scipy.ndimage
), the bottleneck would no longer be the affine transform, but rather the overhead prior to calling it.
From Raphael's timing breakdown, the overall overhead (before and during register
) includes a few hundred ms for loading through sunpy.Map
, ~2 seconds for update_pointing
, and 100 ms for using a temporary map to do the pixel coordinate calculations within map.rotate
. I think any source of possible improvement would start with looking at these functions and either accelerating them or keeping only what's necessary for AIA_prep.
In GitLab by @Cadair on Sep 9, 2020, 11:55
@wtbarnes brought this issue to my attention today, and I think that while I fully expect there to be thing in SunPy which can be made faster I want to raise some points about the methodology needed when benchmarking code and also the most constructive ways of moving forward.
First however, a generic point. SunPy is very open to people coming to us and saying "this is slow, can it be made faster". We are (as always) even more open to people coming to us with a code contribution saying "here I made this thing faster"!
So I think to ascertain what needs to be done a methodical bit of benchmarking needs to be done. It's very important to compare like for like. I am also coming at this strictly from a "how can sunpy be made faster" perspective rather than a "can we make a thing in python that does register faster than aiapy".
So the first question to ask is why is it slow.
I am going to start by benchmarking the following script, and then modify it a little as we need to:
import sunpy.data.sample
import sunpy.map
from aiapy.calibrate import register
aia_map = sunpy.map.Map(sunpy.data.sample.AIA_171_ROLL_IMAGE)
register(aia_map)
running this using the wonderful pyinstrument package gives the following output, which I will hide as it's very long and then come back to:
The important bits here are:
The whole script took 7s
7.275 <module> ../../bench_aia_simple.py:1
about 1s of this is imports, which while a valid target for optimisation I am going to ignore. Next up is actually loading the map:
├─ 0.360 __call__ sunpy/map/map_factory.py:297
│ └─ 0.360 _parse_args sunpy/map/map_factory.py:188
│ └─ 0.359 wrapper sunpy/util/functools.py:17
│ └─ 0.359 _parse_path sunpy/map/map_factory.py:279
│ └─ 0.359 _read_file sunpy/map/map_factory.py:151
│ └─ 0.359 read_file sunpy/io/file_tools.py:55
│ └─ 0.359 read sunpy/io/fits.py:43
│ ├─ 0.173 fitsopen astropy/io/fits/hdu/hdulist.py:35
│ │ └─ 0.173 fromfile astropy/io/fits/hdu/hdulist.py:389
│ │ └─ 0.173 _readfrom astropy/io/fits/hdu/hdulist.py:1040
│ │ └─ 0.172 _read_next_hdu astropy/io/fits/hdu/hdulist.py:1116
│ │ └─ 0.171 readfrom astropy/io/fits/hdu/base.py:296
│ │ └─ 0.171 seek astropy/io/fits/file.py:374
│ │ └─ 0.171 seek gzip.py:368
│ │ [83 frames hidden] gzip, _compression
│ └─ 0.178 __get__ astropy/utils/decorators.py:752
│ └─ 0.178 data astropy/io/fits/hdu/image.py:219
│ └─ 0.178 _get_scaled_image_data astropy/io/fits/hdu/image.py:767
│ └─ 0.178 _get_raw_data astropy/io/fits/hdu/base.py:507
│ └─ 0.178 readarray astropy/io/fits/file.py:246
│ └─ 0.177 _array_from_file astropy/io/fits/util.py:563
│ └─ 0.171 read gzip.py:287
│ [137 frames hidden] gzip, _compression
Almost all of this (other than 0.001s) is loading the FITS file into memory, and implemented in astropy, so nothing for sunpy to worry about here.
Next is register
, which is where things get interesting.
The total time is:
└─ 5.851 register aiapy/calibrate/prep.py:14
which as pointed out is 63% the submap detection in register, which Jack has proposed an optimsation for already:
├─ 3.731 contains_full_disk sunpy/map/maputils.py:127
Almost all the rest is in the call to GenericMap.rotate
├─ 1.981 wrapper astropy/units/decorators.py:178
│ └─ 1.981 rotate sunpy/map/mapbase.py:1212
with 0.12 seconds being in the submap operation at the end of register
:
└─ 0.121 inner_f sunpy/util/decorators.py:307
└─ 0.121 wrapper astropy/units/decorators.py:178
└─ 0.121 submap sunpy/map/mapbase.py:1397
So, having tackled the obvious low-hanging fruit of the contains_full_disk
function we can see the next thing is the GenericMap.rotate
method that we should look at.
To do this properly I am going to change our script to be this:
import sunpy.data.sample
import sunpy.map
from pyinstrument import Profiler
aia_map = sunpy.map.Map(sunpy.data.sample.AIA_171_ROLL_IMAGE)
profiler = Profiler()
profiler.start()
aia_map.rotate(order=3, recenter=True)
profiler.stop()
print(profiler.output_text(unicode=True, color=True))
which hopefully is a good approximation of the rotate call in register
, at least good enough for benchmarking. Running this script gives:
1.951 <module> bench_aia.py:1
└─ 1.936 wrapper astropy/units/decorators.py:178
└─ 1.936 rotate sunpy/map/mapbase.py:1212
├─ 1.767 affine_transform sunpy/image/transform.py:15
│ ├─ 1.464 warp skimage/transform/_warps.py:666
│ │ [15 frames hidden] skimage, numpy, <__array_function__ i...
│ │ 1.335 [self]
│ ├─ 0.126 <module> skimage/transform/__init__.py:1
│ │ [126 frames hidden] skimage, scipy, inspect, numpy, enum,...
│ ├─ 0.106 _showwarnmsg warnings.py:96
│ │ └─ 0.106 _showwarning astropy/logger.py:170
│ │ └─ 0.103 warning logging/__init__.py:1436
│ │ [frames hidden] logging
│ └─ 0.066 [self]
├─ 0.059 reference_coordinate sunpy/map/mapbase.py:884
│ └─ 0.056 coordinate_frame sunpy/map/mapbase.py:480
│ └─ 0.049 wcs sunpy/map/mapbase.py:419
├─ 0.054 world_to_pixel sunpy/map/mapbase.py:1039
│ ├─ 0.025 coordinate_frame sunpy/map/mapbase.py:480
│ │ └─ 0.022 wcs sunpy/map/mapbase.py:419
│ └─ 0.023 wcs sunpy/map/mapbase.py:419
└─ 0.052 pad <__array_function__ internals>:2
[6 frames hidden] <__array_function__ internals>, numpy
The actual numerical chunk of the work here is completing in 1.661 seconds, out of the total 1.936 seconds. This tells me that this method has relatively little overhead in the actual sunpy code and that mostly it's the C layer code in scikit-image which is taking the compute time.
At this point the next step is to swap this out with another implementation of an affine transform (opencv for example) and see how they compare (both in terms of performance, but also accuracy and consistency with the existing implementation). I am not about to do that (my dinner just arrived as I type this) but I can tell you where to start to do it in an informative way. To compare how much faster opencv is you need to fork sunpy and modify the affine_transform
wrapper to call opencv's warpAffine
method rather than scikit-image's affine_transform
. If you can construct a call which is equivalent, and the you repeat my benchmark you will get a directly comparible result.
A final note on implementation. I think that if the opencv or the cupy implementations are sufficently fast while maintaining the same behaviour, then we can look at adding a way to use them though Map.rotate
. I think that the best way to do this would probably be to allow people to pass a function through which get's used to do the affine transform inside sunpy's sunpy.image.transform.affine_transform
. The hardest part about this would be strictly describing and specifying the behaviour of that function so that you are likely to do it right and get comparible answers.
p.s. sorry for the essay :grinning:
In GitLab by @cbard on Sep 9, 2020, 13:00
@Cadair - Looks like you're getting similar timings as Raphael got for the original register
routine, so that's good to see. I agree that an unified benchmarking approach other than %timeit
is necessary, so thanks for pointing me to pyinstrument
! I'll take a crack at adding the openCV warpAffine
to sunpy.map.rotate
and see what falls out.
In GitLab by @cbard on Sep 16, 2020, 14:37
@wtbarnes @Cadair
Finally figured out the nuances of openCV.warpAffine
vs. skimage.warp
vs. scipy.ndimage.transform
. I've added my changes to a local fork and run some benchmarking tests.
Basically, openCV.warpAffine
is 3x faster than skimage.warp
on my laptop.
So, I think it's worth trying to add this somehow to the upstream sunpy repo. However, I'm not sure which of a hard-coded addition of openCV (which I have done) or a more general "pass function to do affine_transform" (which Stuart has suggested) is more appropriate.
In GitLab by @wtbarnes on Sep 16, 2020, 15:56
Thanks @cbard! Would you mind creating an issue on the main sunpy repo regarding this?
I think that having a more general way to pass a function to perform the affine transform would probably be preferable as it is also more future proof.
In GitLab by @wtbarnes on Sep 28, 2020, 10:17
Given that this discussion has now moved entirely to this sunpy issue, I'm going to close this. Once those performance updates are included in sunpy, we can revisit this.
In GitLab by @cbard on Sep 1, 2020, 07:57
tl;dr Easiest acceleration (3-4x vs. orig) is to wait for
contains_full_disk
improvement in upstream sunpy. Recommended to use OpenCV affine transform (8-9x acceleration vs. orig). Recommended to test registering raw data and then creating Sunpy Map (instead of currently creating map and then registering map data).As part of the Helio Hackweek 2020 AIApy project, Raphael Attie (@WaaallEEE) and I looked at
aiapy.calibrate.register
to see how it could be sped up. We wrote some basic replacements forregister
and benchmarked them. To summarize:cfd_register
: Jack Ireland changedsunpy.map.maputils.contains_full_disk
to test only the edge pixels and not every pixel. This is an ongoing pull request in sunpy, but the original version made the overallregister
3-4x faster. (This technically has nothing to do with the actual affine transform).cupy_register
: Drop in a one-to-one CuPy replacement forscipy.ndimage.affine_transform
(CAVEAT: only supports linear interpolation at the moment). This is about 6-9x faster thanregister
.cv_register
: Implement R. Attie's version of register using OpenCV (cv2
) python library and merge it with the current sunpy meta handling. This is about 6-9x faster thanregister
.So, it looks like upgrading to the openCV affine transform is the ideal path forward. This may change in a year or two, as further improvements to CuPy (like adding cubic interpolation to
affine_transform
) are made.Raphael also broke down
register
into its individual components for timing, namely the metadata overhead versus the actual affine transformation. His conclusion:(table courtesy of R. Attie)
Ultimately, we conclude that it may make more sense to do a data prep pipeline without the Sunpy maps; i.e. loading the data, preparing it, and then creating the finished Map. The added abstraction prevents vectorization of the data prep which can be leveraged for acceleration. This data load could be done by using
astropy.io.fits
, which is already included in thesunpy
dependencies.