Closed tomr-stargazer closed 10 years ago
- Is this datacube too big to reasonably compute a catalog for? (I hope not!)
No, this is actually on the small end of dataset sizes
- Are there better ways to profile the performance in situations like this, so I can better understand the code pinch points?
One way is to run python -m cProfile -s cumtime script.py
. This runs a script normally, and then prints out timing information for every function, sorted by the cumulative time spent in each function. It can identify hotspots
Another useful tool is the line profiler. With this you can add a @profile
decorator on top of specific functions, and track timing information line-by-line
- Is it possible or desirable for us to improve the performance of the analysis code?
Yes, something is fishy here. Using your dendrogram settings, my computer builds the catalog in 90 seconds (are you running out of memory? My python process takes up about 300MB during computation)
Here are the most expensive calls on my machine
26989682 function calls (26720625 primitive calls) in 125.400 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.007 0.007 125.402 125.402 test.py:1(<module>)
1 0.000 0.000 92.546 92.546 analysis.py:628(ppv_catalog)
1 0.082 0.082 92.546 92.546 analysis.py:578(_make_catalog)
574512/461322 6.579 0.000 86.173 0.000 {getattr}
28812 0.627 0.000 53.111 0.002 analysis.py:84(mom2_along)
28812 10.195 0.000 51.202 0.002 analysis.py:66(mom2)
12348 0.013 0.000 42.839 0.003 analysis.py:593(<genexpr>)
343686 1.438 0.000 36.635 0.000 nanfunctions.py:424(nansum)
1 17.376 17.376 31.633 31.633 dendrogram.py:88(compute)
14406 0.130 0.000 28.585 0.002 analysis.py:367(_sky_paxes)
14406 0.178 0.000 28.419 0.002 analysis.py:128(projected_paxes)
6174 0.074 0.000 23.780 0.004 analysis.py:283(major_sigma)
6174 0.074 0.000 23.745 0.004 analysis.py:296(minor_sigma)
34986 4.457 0.000 19.705 0.001 analysis.py:61(mom1)
343686 8.021 0.000 19.189 0.000 nanfunctions.py:29(_replace_nan)
2058 8.431 0.004 17.791 0.009 analysis.py:444(area_exact)
2058 0.023 0.000 16.218 0.008 analysis.py:318(area_ellipse)
2058 0.025 0.000 16.193 0.008 analysis.py:309(radius)
1107952 12.070 0.000 12.070 0.000 {method 'reduce' of 'numpy.ufunc'
objects}
So maybe mom2
could be further optimized. There are also libraries like Bottleneck that could speed up the NaN arithmetic by about a factor of 2 -- @keflavich has done this for the radio-cube project. But it seems like something else is going wrong with your setup right now...?
Just eyeballing the input parameters... isn't the min_delta
a bit small? Also, that cube loads funny in ds9 - is there something wrong with the cube? It looks like it has rows of blanks everywhere.
It's telling that mom2 gets called 28K times, even though the final catalog only has ~1K structures -- suggests that memoizing would speed things up. I'll open a PR on this
@keflavich the dimensions of the cube seem to be in an unusual order, and the datacube isn't sampled uniformly, which might be what you're seeing. From the FITS header:
CTYPE1 = 'VELO-LSR' / axis 1 coord type
CTYPE2 = 'GLON-CAR' / axis 2 coord type
CTYPE3 = 'GLAT-CAR' / axis 3 coord type
COMMENT Axis units: v(km/s) x LII(deg) x BII(deg)
In all my analysis code I transpose the cube and header before doing much else.
@tomr-stargazer - actually, everything is fine, just the first frame (fits.getdata('DHT21_Taurus_mom.fits')[0,:,:]
) is missing every other line. It's probably necessary to mask out the regions that are chunked up like that, though, no?
and python -m cProfile -s cumtime test.py
takes 68 sec, and prints
18296810 function calls (18032900 primitive calls) in 68.825 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.003 0.003 68.826 68.826 test.py:1(<module>)
1 0.000 0.000 36.020 36.020 analysis.py:667(ppv_catalog)
1 0.078 0.078 36.020 36.020 analysis.py:617(_make_catalog)
1 17.172 17.172 31.268 31.268 dendrogram.py:89(compute)
561170/461357 6.429 0.000 29.670 0.000 {getattr}
2058 8.370 0.004 17.697 0.009 analysis.py:482(area_exact)
12348 0.010 0.000 13.380 0.001 analysis.py:632(<genexpr>)
39764 9.223 0.000 9.223 0.000 {zip}
307278 5.397 0.000 5.397 0.000 dendrogram.py:325(neighbours)
1073 0.111 0.000 5.017 0.005 table.py:1253(_rebuild_table_colum
n_views)
1029 0.027 0.000 4.961 0.005 table.py:2297(add_row)
26328 0.039 0.000 3.988 0.000 table.py:64(wrapper)
26328 0.170 0.000 3.946 0.000 table.py:598(__new__)
209136/104816 0.632 0.000 3.687 0.000 copy.py:145(deepcopy)
6174 0.061 0.000 2.720 0.000 analysis.py:318(major_sigma)
13146 0.087 0.000 2.555 0.000 table.py:654(copy)
45276/37044 0.118 0.000 2.513 0.000 analysis.py:29(wrapper)
14406 0.107 0.000 2.384 0.000 analysis.py:404(_sky_paxes)
82993/42011 0.150 0.000 2.293 0.000 structure.py:161(values)
4116 0.079 0.000 2.242 0.001 analysis.py:112(mom2_along)
Thanks for the performance upgrade @ChrisBeaumont!
As I mentioned, this is a different machine than my usual one, and I think I've traced most of the problem to a pathological interaction between RAM and hard drive space which was slowing everything down. (Namely, even though the machine has 8 GB RAM, the hard disk has less than 10 GB of free space, and something strange was eating it up as a paging area while making everything slow. There's a longer story behind why I'm using this machine, but my usual one had a hardware failure early last week so this is a backup.)
I ran the profiled test code python -m cProfile -s cumtime dendro_test.py > test_output.txt
(after pulling the performance upgrade in #115) and got this output:
(link to full output, abridged version below:)
18296411 function calls (18032534 primitive calls) in 213.280 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.032 0.032 213.283 213.283 dendro_test.py:1(<module>)
1 0.000 0.000 114.646 114.646 analysis.py:667(ppv_catalog)
1 0.204 0.204 114.646 114.646 analysis.py:617(_make_catalog)
1 50.426 50.426 94.392 94.392 dendrogram.py:89(compute)
561173/461360 18.197 0.000 93.848 0.000 {getattr}
2058 26.986 0.013 56.756 0.028 analysis.py:482(area_exact)
12348 0.040 0.000 40.533 0.003 analysis.py:632(<genexpr>)
39759 29.727 0.001 29.727 0.001 {zip}
1029 0.323 0.000 15.498 0.015 table.py:2297(add_row)
1073 0.397 0.000 15.267 0.014 table.py:1253(_rebuild_table_column_views)
307278 14.880 0.000 14.880 0.000 dendrogram.py:325(neighbours)
26328 0.114 0.000 11.867 0.000 table.py:64(wrapper)
26328 0.389 0.000 11.746 0.000 table.py:598(__new__)
209136/104816 2.057 0.000 11.519 0.000 copy.py:145(deepcopy)
6174 0.206 0.000 11.341 0.002 analysis.py:318(major_sigma)
45276/37044 0.399 0.000 10.788 0.000 analysis.py:29(wrapper)
14406 0.319 0.000 10.413 0.001 analysis.py:404(_sky_paxes)
1029 0.080 0.000 9.945 0.010 analysis.py:158(projected_paxes)
4116 0.173 0.000 9.784 0.002 analysis.py:112(mom2_along)
1029 2.331 0.002 8.816 0.009 analysis.py:93(mom2)
82993/42011 0.628 0.000 7.922 0.000 structure.py:161(values)
13146 0.198 0.000 7.162 0.001 table.py:654(copy)
52744 0.264 0.000 6.233 0.000 metadata.py:126(__set__)
43390 0.507 0.000 5.632 0.000 pruning.py:69(result)
35999 1.216 0.000 5.488 0.000 structure.py:280(_fill_footprint)
52160 0.691 0.000 5.390 0.000 copy.py:306(_reconstruct)
44569 0.094 0.000 5.105 0.000 {all}
127762 0.308 0.000 5.034 0.000 pruning.py:70(<genexpr>)
1029 4.896 0.005 5.004 0.005 dendrogram.py:674(values)
...
# many more lines which took less than 5 seconds
also:
➜ 81d6c2b64f403a207e1b git:(master) ✗ wc -l test_output.txt
5772 test_output.txt
but the first 3075 lines of that should be ignored.
Ok, this looks sensible for a slower computer, I'm going to close this
If anyone has a chance to help me understand the following performance issues I've been running into, I'd be grateful.
On one particular machine, I've been running into issues when calculating
ppv_catalog
on dendrogrammed data of modest to large size. (In particular, the file I'm running into trouble on is hosted here: http://www.cfa.harvard.edu/rtdc/CO/NumberedRegions/DHT21/DHT21_Taurus_mom.fits and is around 10 MB in size)Based on the tracebacks I'm seeing, the resulting dendrogram seems to have around
100,000 indexed structures.The dendrogram has 1029 structures, and for some of those structures, the struct.indices has over 100,000 items.When I hit ctrl-c after about five or so minutes, the interrupt typically hits one of the following lines of code (below examples chosen as illustrative) :
The versions of my python packages:
Some questions: