Closed ElDeveloper closed 9 years ago
Any chance you could do a profile? (%prun
i believe)
nm, ill run
Damn, need to profile within cython to get at it
1752274 function calls (1573488 primitive calls) in 18.642 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 17.508 17.508 17.595 17.595 {biom._subsample._subsample}
164408/2 0.308 0.000 0.781 0.390 copy.py:145(deepcopy)
27854 0.157 0.000 0.157 0.000 {method 'reduce' of 'numpy.ufunc' objects}
164408 0.095 0.000 0.124 0.000 copy.py:267(_keep_alive)
14382/2 0.062 0.000 0.781 0.390 copy.py:234(_deepcopy_tuple)
13474 0.060 0.000 0.346 0.000 copy.py:226(_deepcopy_list)
14380 0.059 0.000 0.661 0.000 copy.py:306(_reconstruct)
343206 0.056 0.000 0.056 0.000 {method 'get' of 'dict' objects}
2 0.051 0.026 0.184 0.092 {biom._filter._filter}
2 0.051 0.025 0.060 0.030 table.py:398(cast_metadata)
301864 0.033 0.000 0.033 0.000 {method 'append' of 'list' objects}
371054 0.028 0.000 0.028 0.000 {id}
13474 0.025 0.000 0.119 0.000 table.py:2297(<lambda>)
27854 0.024 0.000 0.181 0.000 _methods.py:23(_sum)
14380 0.023 0.000 0.023 0.000 {method '__reduce_ex__' of 'object' objects}
14382 0.016 0.000 0.016 0.000 {range}
28762 0.015 0.000 0.015 0.000 {getattr}
122172 0.013 0.000 0.013 0.000 copy.py:198(_deepcopy_atomic)
1 0.009 0.009 18.637 18.637 table.py:2212(subsample)
43150 0.009 0.000 0.009 0.000 {isinstance}
6 0.006 0.001 0.006 0.001 util.py:201(index_list)
14380 0.006 0.000 0.100 0.000 {method 'sum' of 'numpy.ndarray' objects}
1 0.005 0.005 18.642 18.642 <string>:1(<module>)
@ElDeveloper, btw, your %%timeit
includes the time to load the table
No, I don't think the %%timeit includes the time to load the table, with %%timeit the first line is a setUp-like method that gets executed with every loop but is not taken into account for the time measurements. Cool thing I learned today.
On (Aug-18-14|16:53), Daniel McDonald wrote:
@ElDeveloper, btw, your
%%timeit
includes the time to load the table
Reply to this email directly or view it on GitHub: https://github.com/biocore/biom-format/issues/531#issuecomment-52572843
what? crazy, okay.
I can't get the profiler to actually show the cython code unfortunately. It seems like adding the directive and following the tutorial here is not working for whatever reason, so I can't make any suggestions about what is slow. I think its the array expansion, and I've posted on SO about it.
Note, this is still like an order of magnitude faster than QIIME's rarefaction code at least. I had it even faster in pure C, but agree, it would be nice if this were sped up.
Yeah super heandy, found it in this IPython issue: https://github.com/ipython/ipython/issues/5767#issuecomment-41920887
In that case then this is not really an issue?
On (Aug-18-14|17:33), Daniel McDonald wrote:
what? crazy, okay.
I can't get the profiler to actually show the cython code unfortunately. It seems like adding the directive and following the tutorial here is not working for whatever reason, so I can't make any suggestions about what is slow. I think its the array expansion, and I've posted on SO about it.
Note, this is still like an order of magnitude faster than QIIME's rarefaction code at least. I had it even faster in pure C, but agree, it would be nice if this were sped up.
Reply to this email directly or view it on GitHub: https://github.com/biocore/biom-format/issues/531#issuecomment-52575909
Okay, think I found one way to optimize (using Python/numpy not cython below). I kicked the inner expansion loop around and got this:
>>> def expand(a):
... b = np.empty(a.sum(), dtype=np.int32)
... b_idx = 0
... for a_idx, i in enumerate(a):
... for j in range(i):
... b[b_idx] = a_idx
... b_idx += 1
... return b
>>> a = randint(0, 1000, 1000)
>>> %timeit expand(a)
10 loops, best of 3: 83.9 ms per loop
>>> r = arange(1000)
> %timeit repeat(r, a)
100 loops, best of 3: 2 ms per loop
Turns out we can use repeat
, so I think this will be a quick fix. Will issue a PR momentarily
It's not possible to "see" inside a function using cProfile
, so maybe that's what you were seeing when trying to profile _subsample
, Daniel?
Thanks for the %%timeit trick @ElDeveloper!
No problem!
On (Aug-21-14|10:59), Jorge Cañardo Alastuey wrote:
It's not possible to "see" inside a function using
cProfile
, so maybe that's what you were seeing when trying to profile_subsample
, Daniel?Thanks for the %%timeit trick @ElDeveloper!
Reply to this email directly or view it on GitHub: https://github.com/biocore/biom-format/issues/531#issuecomment-52957894
Yes, but they're just the line were function definition starts. E.g, I get
ncalls tottime percall cumtime percall filename:lineno(function)
1 10.788 10.788 11.106 11.106 _subsample.pyx:18(_subsample)
because def _subsample(...
is in line 18 of _subsample.pyx for me (added a few imports etc on top).
Ah, okay, i see. my bad. Agree the next target is permutation... any ideas?
On Thu, Aug 21, 2014 at 12:59 PM, Jorge Cañardo Alastuey < notifications@github.com> wrote:
Yes, but they're just the line were function definition starts. E.g, I get
ncalls tottime percall cumtime percall filename:lineno(function) 1 10.788 10.788 11.106 11.106 _subsample.pyx:18(_subsample)
because def _subsample(... is in line 18 of _subsample.pyx for me (added a few imports etc on top).
— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/531#issuecomment-52966157.
Fooled around with this further, in particular with doing smarter thing to aggregate counts (see here), however I was not able to net any additional gains. Also attempted replacing permutation(...)[:n]
with choice(..., n, replace=False)
but that change did not result in a difference either. Last, tweaked types, disabled boundscheck
and wraparound
and also did not notice a significant difference. So, I'm closing this issue for now as there isn't a clear path forward here for serial code. I think the next approach would be to thread this.
To add, I looked at the numpy source. permutation
and choice
were comparable as choice
is doing a permutation behind the scenes. On more focused investigation, it is permutation
that is the slow step. The numpy implementation is using Fisher-Yates, and is concise, and I believe this is effectively bound by the performance of the PRNG. The next step to improve performance here I believe is to target a multi-threaded. Cython has builtin support for openmp, but that'll take more investigation than is feasible for the pending release
Subsample seems to be under-performing, this is the table that brought this to @adamrp's and my attention: