Closed acviana closed 10 years ago
Then we dive in with some IPython:
import pstats
p = pstats.Stats('mtpipeline_profile_1.pstats')
p.sort_stats('cumulative').print_stats(10)
Which returns:
Thu May 30 16:38:36 2013 mtpipeline_profile_1.pstats
13439832 function calls (13183766 primitive calls) in 76.516 seconds
Ordered by: cumulative time
List reduced from 3822 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.022 0.022 76.519 76.519 ../Code/imaging/mtpipeline.py:8(<module>)
1 0.012 0.012 72.075 72.075 ../Code/imaging/mtpipeline.py:87(run_mtpipeline)
4 0.004 0.001 72.058 18.014 ../Code/imaging/run_trim.py:351(run_trim)
4 0.001 0.000 67.516 16.879 ../Code/imaging/run_trim.py:307(saturated_clip)
4 5.539 1.385 67.515 16.879 ../Code/imaging/run_trim.py:181(replace_by_weight)
4 0.105 0.026 31.340 7.835 ../Code/imaging/display_tools.py:22(before_after)
8 0.000 0.000 17.789 2.224 /usr/stsci/pyssgx/Python-2.7.3/lib/python2.7/site-packages/matplotlib/backends/backend_agg.py:394(draw)
9448/8 0.083 0.000 17.775 2.222 /usr/stsci/pyssgx/Python-2.7.3/lib/python2.7/site-packages/matplotlib/artist.py:53(draw_wrapper)
8 0.002 0.000 17.775 2.222 /usr/stsci/pyssgx/Python-2.7.3/lib/python2.7/site-packages/matplotlib/figure.py:815(draw)
48 0.014 0.000 17.757 0.370 /usr/stsci/pyssgx/Python-2.7.3/lib/python2.7/site-packages/matplotlib/axes.py:1866(draw)
And just for reference here are the column definitions from the Python Docs:
This picture makes sense, but sorting by cumtime includes the subfunctions, it would be better to sort by tottime to see which individual functions are taking the most time:
In [6]: p.sort_stats('time').print_stats(10)
Which gives:
Thu May 30 16:38:36 2013 mtpipeline_profile_1.pstats
13439832 function calls (13183766 primitive calls) in 76.516 seconds
Ordered by: internal time
List reduced from 3822 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
275484 7.824 0.000 7.824 0.000 {method 'mean' of 'numpy.ndarray' objects}
275364 6.709 0.000 13.660 0.000 ../Code/imaging/run_trim.py:211(subarray)
4 5.539 1.385 67.515 16.879 ../Code/imaging/run_trim.py:181(replace_by_weight)
275577 4.258 0.000 4.258 0.000 {method 'sort' of 'numpy.ndarray' objects}
275340 3.564 0.000 16.451 0.000 /usr/stsci/pyssgx/Python-2.7.3/lib/python2.7/site-packages/numpy/lib/function_base.py:2861(median)
413103 3.368 0.000 3.368 0.000 {numpy.core.multiarray.array}
1109645 3.112 0.000 3.112 0.000 {max}
1111851 3.057 0.000 3.057 0.000 {min}
28 2.009 0.072 2.009 0.072 {built-in method putdata}
280434 1.499 0.000 1.499 0.000 {method 'flatten' of 'numpy.ndarray' objects}
Based on this it seems like the replace_by_weight
function is the bottleneck.
Here is the current code:
def replace_by_weight(input_array, weight_array, output=False):
'''
Replace all the pixels that have a weight value of 0 with the local
3x3 median. A copy of the image is created so that the values of
the replaced pixels don't affect the medians of other pixels as
they are replaced. The subarray function is used to generate the
subarrays to prevent error at the array edge.
The indices are set keeping in mind that a[0:2,0:2] returns a 2x2
array while a[0:3,0:3] returns a 3x3 array. Both arrays are
centered on (1,1).
'''
assert isinstance(input_array, N.ndarray), 'array must be numpy array'
assert isinstance(weight_array, N.ndarray), 'array must be numpy array'
output_array = copy.copy(input_array)
saturated_indices = N.where(weight_array == 0)
for index in zip(saturated_indices[0], saturated_indices[1]):
output_array[index] = N.median(subarray(input_array, index[0] - 1, \
index[0] + 2, index[1] - 1, index[1] + 2))
if output != False:
before_after(before_array = input_array,
after_array = output_array,
before_array_name = 'Input Data',
after_array_name = '0-Weight Corrected',
output = output,
pause = False)
return output_array
What does the subarray
function look like?
Would you be able to do this with a scipy.ndimage.filters.generic_filter
?
@iguananaut, I'll look at scipy.ndimage.filters.generic_filter
, I haven't used that before. Here is the subarray
function:
def subarray(array, xmin, xmax, ymin, ymax):
'''
Returns a subarray.
'''
assert isinstance(array, N.ndarray), 'array must be numpy array'
assert isinstance(xmin, int), 'xmin in subarray must be an int.'
assert isinstance(xmax, int), 'xmax in subarray must be an int.'
assert isinstance(ymin, int), 'ymin in subarray must be an int.'
assert isinstance(ymax, int), 'ymax in subarray must be an int.'
assert xmin < xmax, 'xmin must be stictly less than xmax.'
assert ymin < ymax, 'ymin must be stictly less than ymax.'
output_array = array[max(xmin,0):min(xmax,array.shape[0]), \
max(ymin,0):min(ymax,array.shape[1])]
assert output_array.shape[0] == min(xmax,array.shape[0]) - max(xmin,0), \
'Output shape is unexpected: ' + str(min(xmax,array.shape[0]) - max(xmin,0))
assert output_array.shape[1] == min(ymax,array.shape[1]) - max(ymin,0), \
'Output shape is unexpected: ' + str(min(ymax,array.shape[1]) - max(ymin,0))
return output_array
One thing about all those asserts, they should really probably be raising TypeError
instead. And actually most of the type checks are a bit excessive...I would drop those and just leave the sanity checks (like xmin < ymin)
Also I would use np.clip()
to implement this.
I ran a profile like this:
This calls
mtpipeline.py
but turns off everything but the imaging module and only uses the linear mode. However it will run 4 times; one for each drizzling mode and once for each cosmic ray rejection mode.