Thoughts on design - Githubissues

eleftherioszisis commented 8 years ago

I would like to discuss some things that I have noticed in NeuroM. The suggestion that I present requires a lot of changes in order to be applied, thus I would like make clear that this post is more on thinking into the future than making any changes in the current structure of NeuroM. Let's begin:

We have adopted a tree representation of our data which is used throughout the modules via iterators, maps etc.

I understand that in languages where the trees are expressed with pointers, creating iterators and recursively generating stuff is super fast. In python however, generators are not fast, and functional programming will always lag behind the not so cheap function calls (although as far as we do not put them in loops, it's not that bad).

Iterators in Python are memory efficient because they do not store everything in memory at once when iterative calculations are yielded, but in our case that we first create the tree in memory and then return the parent node, we are in a lose lose situation. Plus, we always need to apply an O(n) traversal for operations where this could be avoided by a lookup table structure.

Our loaders and processing are slow when the order of magnitude of number of cells is hundred and up, needing a lot of minutes to load or process a population. When I tried to profile and optimize the loading I stumbled upon the scary truth: Our bottleneck is mainly the make_tree function which iteratively generates a tree from the data. Unfortunately, the recursive alternative is even worse due to the unoptimized recursion in Python.

I also noticed that in a lot of functions we convert the tree to a list of elements (as_elements), and I am wondering what would be the ratio of the times that we need a tree as a tree over the times that we need it as a array of points. Maybe we would need to write less code if we had the points to begin with.

Being stuck on the python level, we will always find these overheads in front of us, which kills the performance of NeuroM and causes serious problems when populations need to be analyzed.

You will ask: what alternatives do we have? Well, we can always go full numpy and exploit the speed of light (c). Instead of creating trees we can load the data through numpy, and generate a connectivity matrix for the "parenting" (I think that Pneumatk uses this design, not sure though). I suspect that this will eliminate the majority of the overheads.

What do you think?

berthelm commented 8 years ago

I remember having a discussion a long time ago about pneumatk where the way the data was stored was important to the efficiency of the analysis. I guess this is a general problem. As I don’t know python, I cannot contribute solutions, but it seems clear that a discussion of the points is important.

From my perspective, and if I imagine myself as a user, I would want this discussion to emphasize the following:

speed of loading preferably up to 1000s of cells
speed of analysis
ease of extending the analysis
consistency with other BBP tools that I have heard mentioned relating to issues like the apical point

Who should be at this meeting and will someone make a set of concrete items for the agenda?

Julian

Julian Shillcock Algorithm Development Section Manager

julian.shillcock@epfl.chmailto:julian.shillcock@epfl.ch phone: +41 (0)21 69 39679

On 15 Apr 2016, at 12:15, Eleftherios Zisis notifications@github.com<mailto:notifications@github.com> wrote:

I would like to discuss some things that I have noticed in NeuroM. The suggestion that I present requires a lot of changes in order to be applied, thus I would like make clear that this post is more on thinking into the future than making any changes in the current structure of NeuroM. Let's begin:

We have adopted a tree representation of our data which is used throughout the modules via iterators, maps etc.

I understand that in languages where the trees are expressed with pointers, creating iterators and recursively generating stuff is super fast. In python however, generators are not fast, and functional programming will always lag behind the not so cheap function calls (although as far as we do not put them in loops, it's not that bad).

Iterators in Python are memory efficient because they do not store everything in memory at once when iterative calculations are yielded, but in our case that we first create the tree in memory and then return the parent node, we are in a lose lose situation. Plus, we always need to apply an O(n) traversal for operations where this could be avoided by a lookup table structure.

Our loaders and processing are slow when the order of magnitude of number of cells is hundred and up, needing a lot of minutes to load or process a population. When I tried to profile and optimize the loading I stumbled upon the scary truth: Our bottleneck is mainly the make_tree function which iteratively generates a tree from the data. Unfortunately, the recursive alternative is even worse due to the unoptimized recursion in Python.

I also noticed that in a lot of functions we convert the tree to a list of elements (as_elements), and I am wondering what would be the ratio of the times that we need a tree as a tree over the times that we need it as a array of points. Maybe we would need to write less code if we had the points to begin with.

Being stuck on the python level, we will always find these overheads in front of us, which kills the performance of NeuroM and causes serious problems when populations need to be analyzed.

You will ask: what alternatives do we have? Well, we can always go full numpy and exploit the speed of light (c). Instead of creating trees we can load the data through numpy, and generate a connectivity matrix for the "parenting" (I think that Pneumatk uses this design, not sure though). I suspect that this will eliminate the majority of the overheads.

What do you think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/BlueBrain/NeuroM/issues/301

juanchopanza commented 8 years ago

These are some valid points, especially since we're using NeuroM beyond the scope for which it was designed. It was never intended to read hundreds of cells simultaneously, so time spent loading small numbers of cells seemed like a reasonable trade-off for the flexibility given by the iterators. Also remember that one of the roles of the iterator system is to provide different views of a neuron with a small memory footprint (points, segments, sections, bifurcations, triplets...).

If the overhead of this system is too high, we could certainly consider different approaches. It is not clear to me how to maintain the navigation over the different types of view, but I haven't spent too much time thinking about it.

@eleftherioszisis Could you provide a profiling script with (large) sample data so that anyone interested in having a look at this can try it out? If we change anything, however small, we need to be sure it is making things better.

juanchopanza commented 8 years ago

@berthelm My recollection is that loading cells with pneumatk would take much longer than with neurom. But maybe it was optimized for some types of complicated analysis that we don't perform with neurom. In any case, we are in a good situation in that we have something that "works", and we have many, many tests which allow us to change implementations without breaking anything.

Also note that most of the bullet points you mention, particularly the last one, were not design considerations of neurom. It was written to read small numbers of SWC files and get basic morphometrics out. Reliably.

juanchopanza commented 8 years ago

BTW, all the readers produce a 2D numpy array, which is then used to build the trees. This could be re-used in an alternative, less functional implementation. Here's how to get a raw data block and print its contents:

from neuom.io.utils import load_data
d = load_data('some_filename.swc) # or .h5 or .asc
# the data block is in d.data_block
for i, p in enumerate(d.data_block):
    print i, p

eleftherioszisis commented 8 years ago

Yes indeed. I should have phrased differently the last paragraph. I meant to say that we can directly use the numpy data along with a connectivity matrix instead of creating the trees, as you point Juan.

eleftherioszisis commented 8 years ago

You can find below the profiling of loading a population of 1000 cells. These are h5v1 morphologies, the loader of which is probably the faster of all three (If I am not mistaken). I removed the results with less than one second internal time.

My system monitor indicates that the ipython kernel which I used to load the population needs approximately 13 GB to keep the objects in memory.

312535464 function calls (312534453 primitive calls) in 1071.480 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6348  153.294    0.024  554.905    0.087 utils.py:59(make_tree)
 12403176   62.399    0.000  145.097    0.000 hdf5.py:147(find_group)
 12508524   60.310    0.000   77.890    0.000 readers.py:156(get_row)
24810352/24809352   55.922    0.000  238.386    0.000 utils.py:44(memoizer)
     1000   53.491    0.053  302.808    0.303 hdf5.py:143(unpack_data)
 12309524   52.045    0.000   83.842    0.000 tree.py:79(ipreorder)
 37007190   47.229    0.000   47.229    0.000 :0(append)
 12303176   45.912    0.000   79.921    0.000 tree.py:217(as_elements)
     1000   43.418    0.043   59.524    0.060 readers.py:122(__init__)
 12403176   41.922    0.000   60.053    0.000 hdf5.py:158(find_parent_id)
     6348   35.943    0.006  201.740    0.032 morphtree.py:120(find_tree_type)
 12403176   34.688    0.000   61.446    0.000 fromnumeric.py:1031(searchsorted)
 25099892   32.600    0.000   32.600    0.000 :0(pop)
 12296828   31.470    0.000   46.019    0.000 tree.py:49(add_child)
 12403176   26.759    0.000   26.759    0.000 :0(searchsorted)
 12403176   22.515    0.000   22.515    0.000 readers.py:133(get_children)
 12403176   21.252    0.000   21.252    0.000 :0(max)
 12303176   20.849    0.000   20.849    0.000 tree.py:40(__init__)
 12310524   18.910    0.000   18.910    0.000 :0(hasattr)
 12403176   18.377    0.000   18.377    0.000 utils.py:46(<lambda>)
   101000   17.862    0.000   36.239    0.000 readers.py:179(<genexpr>)
 12508524   17.580    0.000   17.580    0.000 readers.py:140(_get_idx)
 12303176   16.625    0.000   16.625    0.000 :0(extend)
 12819420   15.954    0.000   15.954    0.000 :0(isinstance)
  3565172   13.454    0.000   14.112    0.000 :0(array)
   489540    8.479    0.000   62.559    0.000 numeric.py:2281(isclose)
   489540    7.919    0.000   32.606    0.000 numeric.py:2340(within_tol)
     1000    7.350    0.007  102.055    0.102 hdf5.py:176(remove_duplicate_points)
   489540    7.177    0.000   19.238    0.000 hdf5.py:186(_find_last_point)
   979080    6.849    0.000   15.095    0.000 numeric.py:2478(seterr)
  1468620    6.494    0.000   25.987    0.000 fromnumeric.py:1968(all)
  1987552    5.566    0.000    9.050    0.000 numeric.py:476(asanyarray)
  1480968    5.385    0.000    5.385    0.000 :0(reduce)
  1468620    4.357    0.000   13.539    0.000 :0(all)
  1468620    3.929    0.000    9.182    0.000 _methods.py:40(_all)
     1000    3.545    0.004    3.548    0.004 check.py:60(is_single_tree)
   979080    3.530    0.000    5.000    0.000 numeric.py:2578(geterr)
   489540    3.145    0.000   74.603    0.000 numeric.py:2216(allclose)
  1958160    3.023    0.000    3.023    0.000 :0(geterrobj)
   489540    2.842    0.000    8.459    0.000 fromnumeric.py:712(sort)
   491540    2.406    0.000    2.406    0.000 :0(sort)
   979080    1.926    0.000    1.926    0.000 :0(abs)
     1000    1.919    0.002  408.254    0.408 hdf5.py:114(read)
   979080    1.693    0.000    1.693    0.000 :0(seterrobj)
   489540    1.693    0.000    9.519    0.000 numeric.py:2869(__enter__)
   489540    1.643    0.000    8.911    0.000 numeric.py:2874(__exit__)
   489540    1.530    0.000    2.320    0.000 numeric.py:1970(isscalar)
     3000    1.455    0.000    1.455    0.000 :0(zip)
  1003080    1.399    0.000    1.399    0.000 :0(len)
   489540    1.390    0.000    1.390    0.000 :0(copy)
   489540    1.374    0.000    2.010    0.000 numeric.py:2865(__init__)
   489540    1.313    0.000    1.988    0.000 fromnumeric.py:501(transpose)
   489540    1.222    0.000    1.222    0.000 :0(result_type)
   489540    1.014    0.000    1.014    0.000 :0(where)

juanchopanza commented 8 years ago

Could you provide the script you used to get this, and maybe point me to some data? BTW, I would suspect the SWC loader is the fastest, because it doesn't have to do much work.

eleftherioszisis commented 8 years ago

The script I used for the profiling is the following: test.zip

from neurom.ezy import load_population
import profile
profile.run("load_population(path)", sort=1)

For the data I used the cells I 've attached, copied multiple times until they reached 1000. :P

juanchopanza commented 8 years ago

I think we need to profile some analysis. Because if you don't do anything, you don't need the trees anyway. But when you start needing to analyse the morphologies, you need some structures set up that allow you to access the data, and these won't come for free. So we need to know if the tree making dominates, and then if the iterations add too much overhead.

eleftherioszisis commented 8 years ago

I can add some profiling using the analysis. Probably not today though, because I unloaded the population and it will take some time to load again :P.

Any suggestions? Should I randomly pick features, or would you like to see specific parts of the code in action?

lidakanari commented 8 years ago

I have something in mind (that shows x200 of performance improvement) but I'm too busy at this time to look into this. If it is not urgent I can create an example next week.

juanchopanza commented 8 years ago

I'm thinking maybe we can do it separately. Profile the analysis with a smaller number of cells, see if the iteration is particularly slow. We can do this next week.

@lidakanari x200 sounds pretty good. Is this in forming the trees, or unpacking the data, or something else?

eleftherioszisis commented 8 years ago

Something a little bit irrelevant but I am curious. Does anyone have experience with pypy and its JIT compiler which is supposed to lead to crazy speeds and memory consumption that are compared with Cython?

juanchopanza commented 8 years ago

@eleftherioszisis No, I haven't. I have played around with cython though, which is cool (but adds a but of complexity on the packaging and installation side of things.) I was thinking it (cython) would be one of the obvious things to try if there are bottlenecks that can't be fixed with pure python.

lidakanari commented 8 years ago

The "feature" I have in mind concerns the formation of sections. The comparison I made in the past is the following: Take a relatively large dataset of trees. Compute sections (beginning and ending) with two different ways (neurom tree structure, adjacency matrix used in Pneumatk) and measure the relative times of the two methods.

eleftherioszisis commented 8 years ago

Although I would love to learn cython, I think this should be left as the last resort because, as you said, NeuroM will be bound to quite a few dependencies and the code will become even more complicated if we start mixing languages (sounds cool though).

Even if pypy is proven worthy to use, it needs quite a few libraries as well. There is however the advantage that we do not need to change our code, but, as with cython, it should be left in case there is no other way to gain performance.

eleftherioszisis commented 8 years ago

An interesting profile. I ran the profile on Armando's analysis on a synthesized cell (from morphsyn) with a lot of segments (~170000). It took almost an hour for a single cell. It's interesting how as_elements behaves.

profile.run("get_feature('section_path_distances', n)", sort=1)
         887489819 function calls (810521357 primitive calls) in 2109.302 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
115452693/38484231  428.257    0.000  809.514    0.000 tree.py:217(as_elements)
 38484231  285.807    0.000  589.131    0.000 linalg.py:1924(norm)
 38487682  237.314    0.000 2057.491    0.000 morphtree.py:47(<genexpr>)
115452693  184.065    0.000  553.540    0.000 tree.py:222(<genexpr>)
 38484231  133.907    0.000  823.157    0.000 morphmath.py:171(point_dist)
115452694  130.626    0.000  130.626    0.000 :0(hasattr)
 38484231  129.593    0.000  129.593    0.000 :0(reduce)
 38484231  100.120    0.000  100.120    0.000 morphmath.py:37(vector)
 38484231   76.013    0.000  125.137    0.000 numeric.py:392(asarray)
 38484231   75.886    0.000  899.043    0.000 morphmath.py:223(segment_length)
 76971914   66.571    0.000   66.571    0.000 :0(isinstance)
 38484231   49.124    0.000   49.124    0.000 :0(array)
 38484231   48.593    0.000   48.593    0.000 :0(conj)
     3451   47.499    0.014 2104.989    0.610 :0(sum)
 38662372   38.091    0.000   38.091    0.000 tree.py:102(iupstream)
 38487682   37.841    0.000   37.841    0.000 tree.py:146(<lambda>)
 38484231   35.844    0.000   35.844    0.000 tree.py:145(<lambda>)
   164338    0.611    0.000    1.631    0.000 tree.py:169(seed_node)
     3451    0.609    0.000    1.456    0.000 tree.py:158(get_section)
   326948    0.594    0.000    0.868    0.000 tree.py:59(is_forking_point)
   164339    0.489    0.000    0.795    0.000 tree.py:79(ipreorder)
   491285    0.399    0.000    0.399    0.000 :0(len)
        1    0.348    0.348 2109.302 2109.302 :0(fromiter)
   325225    0.294    0.000    0.294    0.000 tree.py:74(is_root)
   164337    0.284    0.000    0.409    0.000 tree.py:69(is_leaf)
   164338    0.157    0.000    0.157    0.000 :0(extend)
   164338    0.149    0.000    0.149    0.000 :0(pop)
   164337    0.138    0.000    0.138    0.000 :0(append)
     3451    0.027    0.000 2105.051    0.610 morphtree.py:45(path_length)
     3451    0.014    0.000 2105.008    0.610 fromnumeric.py:1621(sum)
     3451    0.011    0.000 2105.062    0.610 sections.py:109(end_point_path_length)
     3451    0.010    0.000    0.012    0.000 tree.py:232(imap_val)
     3451    0.006    0.000 2105.068    0.610 sections.py:52(_wrapper)
     3451    0.005    0.000    0.005    0.000 :0(reverse)
     3451    0.004    0.000    0.004    0.000 tree.py:135(isegment)
     3451    0.002    0.000    0.002    0.000 tree.py:227(val_iter)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.000    0.000    0.000    0.000 types.py:45(tree_type_checker)
        1    0.000    0.000 2109.302 2109.302 __init__.py:73(get_feature)
        1    0.000    0.000    0.000    0.000 __init__.py:35(iter_neurites)
        1    0.000    0.000 2109.302 2109.302 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 tree.py:149(isection)
        1    0.000    0.000    0.000    0.000 :0(filter)
        1    0.000    0.000    0.000    0.000 types.py:59(check_tree_type)
        1    0.000    0.000    0.000    0.000 :0(from_iterable)
        1    0.000    0.000    0.000    0.000 __init__.py:359(__getattr__)
        1    0.000    0.000    0.000    0.000 neurite_features.py:114(section_path_distances)
        1    0.000    0.000 2109.302 2109.302 profile:0(get_feature('section_path_distances', pop.neurons[0]))
        1    0.000    0.000    0.000    0.000 __init__.py:70(_is_dunder)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 tree.py:257(i_chain2)

juanchopanza commented 8 years ago

I think path_distance is a great candidate for memoization. It is worth memoizing the relevant function and checking again. Also, have you tried using runsnakerun to visualize the profiling info? It makes it easier (at least to me) to see how the cumulative time spend in a function is shared among things called by that function.

eleftherioszisis commented 8 years ago

Memoization might indeed make up for the overhead of the looping and the as_elements calls. I will try runsnakerun next time. I never tried to use something elegant for profiling since I'm lazy. :P

juanchopanza commented 8 years ago

BTW, I tried this

from neurom import ezy
nrn = ezy.load_neuron('C060109A3.h5')
# run only once to avoid any memoization / caching effects
%timeit -r 1 -n 1 ezy.get('section_path_distances', nrn)

and got 321ms. This neuron has 9187 segments. Where could I find a "slow" neuron (like the one you tested, but maybe not that slow.)

eleftherioszisis commented 8 years ago

As for the slow cells, can't you generate multiple cells from Morphsyn with gradually increasing segment number in order to see how these functions explode in computational time? Maybe @liesbethvanherpe might have synthesized cells with less segments in order not to wait one hour for a profile. :P

juanchopanza commented 8 years ago

@eleftherioszisis I won't do that at the moment. If we think there is a problem, then it should be demonstrable/reproducible already, no?

eleftherioszisis commented 8 years ago

But, it is a problem when analysis on synthesized cells is practically impossible, isn't it?

If you use one of the synthesized cells with the hundred thousand of segments, then you' ll have the same result. I don't have the data because I ran the profile on Armando's PC. This is why I am suggesting that @liesbethvanherpe can provide you with one of the cells.

juanchopanza commented 8 years ago

OK, if it is a standard cell coming out of morphsyn, then that is fine.

juanchopanza commented 8 years ago

Further info: for an "Armando" cell with ~165K segments, load_neuron is completely dominated by remove_duplicate_points.

lidakanari commented 8 years ago

Profiling for this?

juanchopanza commented 8 years ago

@lidakanari Note that I am talking about loading a single neuron, after the changes already made in pull request #302. Using this:

from neurom.ezy import load_neuron
import cProfile
cProfile.run("load_neuron('big_neuron.h5')", sort=1)

I get (edit: after the changes to find_tree_type):

         3480632 function calls (3480631 primitive calls) in 7.189 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    2.533    2.533    2.876    2.876 hdf5.py:176(remove_duplicate_points)
   164338    0.695    0.000    0.695    0.000 tree.py:40(__init__)
        1    0.431    0.431    1.861    1.861 utils.py:59(make_tree)
        1    0.399    0.399    0.431    0.431 readers.py:122(__init__)
        1    0.383    0.383    1.510    1.510 hdf5.py:143(unpack_data)
   164343    0.266    0.000    0.626    0.000 hdf5.py:147(find_group)
328690/328689    0.183    0.000    0.977    0.000 utils.py:44(memoizer)
    24177    0.182    0.000    0.187    0.000 {numpy.core.multiarray.array}
   164343    0.179    0.000    0.179    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
   164343    0.166    0.000    0.220    0.000 hdf5.py:158(find_parent_id)
   164343    0.145    0.000    0.145    0.000 readers.py:133(get_children)
   164348    0.143    0.000    0.185    0.000 readers.py:156(get_row)
   164339    0.139    0.000    0.174    0.000 tree.py:79(ipreorder)
   164343    0.113    0.000    0.113    0.000 {max}
        3    0.107    0.036    0.107    0.036 {zip}
        1    0.091    0.091    4.486    4.486 hdf5.py:114(read)
        1    0.085    0.085    0.085    0.085 check.py:60(is_single_tree)
     3453    0.083    0.000    0.083    0.000 {method 'sort' of 'numpy.ndarray' objects}

lidakanari commented 8 years ago

Yes, I can see that remove_duplicate_points is costing a lot of computational time. It is obviously not optimized. Could it be?

However, this function depends on the number of section and not the number of segments, since it only checks the first point of each section. If I can provide any information to help you optimize this please let me know.

lidakanari commented 8 years ago

@juanchopanza Also, I seem not to get any notification for your pull request so I didn't see the recent changes #302 at all...

eleftherioszisis commented 8 years ago

I am wondering where's the most time spent in the remove_duplicate_points. Is it the loop? In that case is it possible to be fully vectorized?

juanchopanza commented 8 years ago

I'll have another look at remove_duplicates tomorrow. I need to see in more detail where exactly the time is being spent. I can't figure it out from the profiling table!

juanchopanza commented 8 years ago

With the changes from #303 remove_duplicates is no longer the bottleneck. It is 2nd, close behind the Tree constructor.

juanchopanza commented 8 years ago

FYI, with the latest changes, including #311, a quick profiling of neurom.io.load_data('big_neuron.h5') shows an improvement from ~5s to ~0.7s wrt neurom 0.0.16. The data unpacking part no longer dominates the neuron building, but I want to optimize it further because we're likely to keep it. 0.7s for a single neuron, no matter how large, still seems slow to me.

juanchopanza commented 8 years ago

A similar profiling of the section_path_distances feature extraction with the same large neuron showed a reduction in time from ~450s to ~95s. Still unacceptably slow, but this has not been optimized as much as the data unpacking and neuron building.

lidakanari commented 8 years ago

That's amazingly better! Thanks a lot Juan :)

juanchopanza commented 8 years ago

@eleftherioszisis @lidakanari Do you have an example of efficiently creating this connectivity matrix à la pneumatk, and some example that shows that it would be faster to calculate, e,g, the section path distances? Bear in mind that is needs to work for SWC, so it should be something that can be done from the raw data block.

Also an indication of how long it takes pneumatk to load the attached big_neuron.h5.zip and get the section path distances out would be good.

lidakanari commented 8 years ago

@juanchopanza Here is my example about the adjacency matrix optimization:

In [29]: %timeit -r 1 -n 1 tr.get_sections()
1 loops, best of 1: 5.17 s per loop

In [30]: %timeit -r 1 -n 1 tr.get_sections_2()
1 loops, best of 1: 2.19 ms per loop

with the following functions:

def get_sections(self):
    beg = [0]
    end = [self.get_way_to_section_end(0)[-1]]

    for b in self.get_multifurcations():

        children = self.get_children(b)
        for ch in children:
            beg = beg + [b]
            end = end + [self.get_way_to_section_end(ch)[-1]]

    return beg, end

def get_sections_2(self):
    import scipy.sparse as sp

    ts = self.size()
    dA = sp.csr_matrix((_np.ones(ts - 1), (range(1, ts), self.p[1:])), shape=(ts, ts))

    end = _np.array(sp.csr_matrix.sum(dA, 0) != 1)[0].nonzero()[0]
    beg = _np.append([0], self.p[_np.delete(_np.hstack([0, 1 + end]), len(end))][1:])

    return beg, end

lidakanari commented 8 years ago

@juanchopanza Comparing Pneumatk - NeuroM times:

Pneumatk:

timeit -r 1 -n 1 pn.io.load('./big_neuron.h5')
1 loops, best of 3: 176 s per loop

%timeit -r 1 -n 1 n.get_section_path_distances()
1 loops, best of 1: 1.33e+03 s per loop

NeuroM:

timeit -r 1 -n 1 ezy.load_neuron('./big_neuron.h5')
1 loops, best of 1: 2.14 s per loop

%timeit -r 1 -n 1 ezy.get('section_path_distances', n)
1 loops, best of 1: 96.4 s per loop

juanchopanza commented 8 years ago

OK, I take it that is on a laptop? ezy.load_neuron takes ~600ms on my desktop.

juanchopanza commented 8 years ago

Unfortunately, it is difficult to make any sense of the adjacency matrix stuff with incomplete code.

lidakanari commented 8 years ago

On a laptop yes (4 years old).

juanchopanza commented 8 years ago

OK, well if anyone has a real example of, for example, efficiently getting the section path lengths starting from the raw data block that would be great.

lidakanari commented 8 years ago

Sorry but I will not have time to produce this any time soon.

juanchopanza commented 8 years ago

@eleftherioszisis Are you in a position to provide an example? I think you had a few good ideas.

eleftherioszisis commented 8 years ago

I was mainly referring to Leda's implementation of the adjacency matrix (which is faster than Pneumatk's one where the code is a wild party), thus I do not have an example or code to provide apart from what Leda posted.

juanchopanza commented 8 years ago

@eleftherioszisis Unfortunately the code posted is insufficient to do anything because is is missing context. I also have no idea how it could be used to calculate the path distances. In the absence any more evidence, I may play with making a tree of sections instead of trees of points. This should help with memory and should make the iteration over sections more efficient. Whether that has a significant impact on timing is another matter.

eleftherioszisis commented 8 years ago

@juanchopanza I think that this is a good idea which will probably make make_tree faster. If you don't want to lose time on something that you are unsure of its efficiency, we can postpone this discussion until Leda can provide a more detailed example or algorithm.

juanchopanza commented 8 years ago

I think #316 looks quite promising. Over 10x speed up WRT current version. There are still a few things that don't seem quite right, and it only works for h5 data, but it is a start.

eleftherioszisis commented 8 years ago

Loading of 1000 cells in 30 secs (previously 1000 secs). Cool!

import profile
from neurom.core.section_neuron import load_neuron
filenames = os.listdir(path)
profile.run("tuple(load_neuron(path + f) for f in filenames)")

profile.run("tuple(load_neuron('/home/zisis/data/profiling/' + f) for f in filenames)", sort=1)
         7358684 function calls in 29.538 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6348    4.415    0.001   19.999    0.003 section_neuron.py:87(make_tree)
   489540    4.090    0.000    4.090    0.000 tree.py:40(__init__)
   991776    3.908    0.000    6.283    0.000 tree.py:79(ipreorder)
     1000    3.250    0.003    4.338    0.004 hdf5.py:97(unpack_data)
  1467272    1.966    0.000    1.966    0.000 :0(append)
  1472620    1.906    0.000    1.906    0.000 :0(pop)
   483192    1.257    0.000    1.854    0.000 tree.py:49(add_child)
   979080    1.207    0.000    1.207    0.000 :0(extend)
   491540    1.031    0.000    1.031    0.000 :0(arange)
     6348    0.914    0.000    5.065    0.001 section_neuron.py:80(set_neurite_type)
     1000    0.809    0.001    1.451    0.001 section_neuron.py:57(__init__)
     1000    0.459    0.000    0.459    0.000 section_neuron.py:67(neurite_trunks)
   100000    0.401    0.000    0.958    0.000 morphmath.py:157(point_dist2)
   100000    0.344    0.000    0.344    0.000 morphmath.py:36(vector)
   100000    0.344    0.000    1.302    0.000 morphmath.py:170(point_dist)
   101000    0.289    0.000    1.591    0.000 morphmath.py:212(<genexpr>)
   100000    0.212    0.000    0.212    0.000 :0(dot)
    37740    0.211    0.000    0.796    0.000 :0(array)
     3000    0.156    0.000    0.156    0.000 :0(open)
     6348    0.151    0.000    0.703    0.000 function_base.py:3090(_median)
     1000    0.143    0.000    1.789    0.002 morphmath.py:207(average_points_dist)
     8348    0.118    0.000    0.306    0.000 _methods.py:53(_mean)
     2000    0.117    0.000    0.117    0.000 :0(read)
    12348    0.088    0.000    0.088    0.000 :0(reduce)
    27392    0.082    0.000    0.126    0.000 numeric.py:476(asanyarray)
     1000    0.081    0.000    0.081    0.000 :0(close)
    44436    0.071    0.000    0.071    0.000 __init__.py:393(<genexpr>)
     4000    0.067    0.000    0.135    0.000 selections.py:269(broadcast)
     6348    0.065    0.000    0.809    0.000 function_base.py:2941(_ureduce)
     1000    0.063    0.000    0.063    0.000 section_neuron.py:74(soma_points)
     6000    0.061    0.000    0.069    0.000 dataset.py:153(shape)
    12696    0.061    0.000    0.133    0.000 numerictypes.py:728(issubdtype)
        1    0.059    0.059   29.538   29.538 <string>:1(<module>)
     3000    0.053    0.000    0.053    0.000 :0(zeros)
    33740    0.052    0.000    0.052    0.000 :0(issubclass)
    26696    0.046    0.000    0.046    0.000 :0(isinstance)
     4000    0.041    0.000    0.120    0.000 selections.py:241(__init__)
     6348    0.039    0.000    0.116    0.000 fromnumeric.py:554(partition)
     2000    0.039    0.000    0.438    0.000 dataset.py:553(read_direct)
     8348    0.038    0.000    0.343    0.000 fromnumeric.py:2789(mean)
     1000    0.035    0.000   29.474    0.029 section_neuron.py:110(load_neuron)
     6348    0.035    0.000    0.035    0.000 :0(partition)
     2000    0.035    0.000    0.091    0.000 dataset.py:235(__init__)
     2000    0.035    0.000    0.586    0.000 dataset.py:598(__array__)
    12696    0.034    0.000    0.053    0.000 numerictypes.py:660(issubclass_)
    23348    0.034    0.000    0.034    0.000 :0(len)
     8348    0.034    0.000    0.050    0.000 _methods.py:43(_count_reduce_items)
     2000    0.031    0.000    0.193    0.000 group.py:145(__getitem__)
     6348    0.030    0.000    0.839    0.000 function_base.py:3001(median)
     1000    0.029    0.000    0.914    0.001 hdf5.py:167(_unpack_v1)
     2000    0.027    0.000    0.029    0.000 dataset.py:166(dtype)
     4000    0.026    0.000    0.058    0.000 selections.py:144(__init__)
     1000    0.026    0.000    7.036    0.007 hdf5.py:70(read)
     2000    0.025    0.000    0.038    0.000 group.py:286(__contains__)
     1000    0.023    0.000    0.023    0.000 :0(create)
    17000    0.022    0.000    0.022    0.000 base.py:211(id)
     2000    0.021    0.000    0.021    0.000 :0(get_create_plist)
     4000    0.019    0.000    0.019    0.000 :0(create_simple)
     6348    0.018    0.000    0.026    0.000 __init__.py:398(__len__)
    14000    0.017    0.000    0.017    0.000 selections.py:160(shape)
     6348    0.017    0.000    0.017    0.000 numeric.py:1396(rollaxis)
     2000    0.017    0.000    0.027    0.000 filters.py:200(get_filters)
     6348    0.016    0.000    0.016    0.000 __init__.py:392(__iter__)
     1000    0.015    0.000    0.177    0.000 files.py:174(__init__)
     6348    0.015    0.000    0.015    0.000 :0(flatten)
     4000    0.015    0.000    0.052    0.000 fromnumeric.py:1838(product)
     4000    0.014    0.000    0.020    0.000 base.py:91(_e)
     7348    0.013    0.000    0.013    0.000 :0(get)
     7348    0.013    0.000    0.013    0.000 :0(hasattr)
     1000    0.012    0.000    1.861    0.002 neuron.py:120(__init__)
     1000    0.011    0.000    0.055    0.000 neuron.py:125(center)
     6000    0.009    0.000    0.009    0.000 selections.py:300(<genexpr>)
     2000    0.008    0.000    0.008    0.000 :0(empty)

berthelm commented 8 years ago

Yay! 300 cells per second, that’s awesome!

And a whole circuit of 30,000 cells in 15 minutes (memory permitting!)

Julian Shillcock Algorithm Development Section Manager Blue Brain Project EPFL

julian.shillcock@epfl.chmailto:julian.shillcock@epfl.ch phone: +41 (0)21 69 39679

On 29 Apr 2016, at 16:00, Eleftherios Zisis notifications@github.com<mailto:notifications@github.com> wrote:

Loading of 1000 cells in 30 secs (previously 1000). Cool!

import profile filenames = os.listdir(path) profile.run("tuple(load_neuron(path + f) for f in filenames)")

profile.run("tuple(load_neuron('/home/zisis/data/profiling/' + f) for f in filenames)", sort=1) 7358684 function calls in 29.538 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 6348 4.415 0.001 19.999 0.003 section_neuron.py:87(make_tree) 489540 4.090 0.000 4.090 0.000 tree.py:40(init) 991776 3.908 0.000 6.283 0.000 tree.py:79(ipreorder) 1000 3.250 0.003 4.338 0.004 hdf5.py:97(unpack_data) 1467272 1.966 0.000 1.966 0.000 :0(append) 1472620 1.906 0.000 1.906 0.000 :0(pop) 483192 1.257 0.000 1.854 0.000 tree.py:49(add_child) 979080 1.207 0.000 1.207 0.000 :0(extend) 491540 1.031 0.000 1.031 0.000 :0(arange) 6348 0.914 0.000 5.065 0.001 section_neuron.py:80(set_neurite_type) 1000 0.809 0.001 1.451 0.001 section_neuron.py:57(init) 1000 0.459 0.000 0.459 0.000 section_neuron.py:67(neurite_trunks) 100000 0.401 0.000 0.958 0.000 morphmath.py:157(point_dist2) 100000 0.344 0.000 0.344 0.000 morphmath.py:36(vector) 100000 0.344 0.000 1.302 0.000 morphmath.py:170(point_dist) 101000 0.289 0.000 1.591 0.000 morphmath.py:212() 100000 0.212 0.000 0.212 0.000 :0(dot) 37740 0.211 0.000 0.796 0.000 :0(array) 3000 0.156 0.000 0.156 0.000 :0(open) 6348 0.151 0.000 0.703 0.000 function_base.py:3090(_median) 1000 0.143 0.000 1.789 0.002 morphmath.py:207(average_points_dist) 8348 0.118 0.000 0.306 0.000 _methods.py:53(_mean) 2000 0.117 0.000 0.117 0.000 :0(read) 12348 0.088 0.000 0.088 0.000 :0(reduce) 27392 0.082 0.000 0.126 0.000 numeric.py:476(asanyarray) 1000 0.081 0.000 0.081 0.000 :0(close) 44436 0.071 0.000 0.071 0.000 init.py:393() 4000 0.067 0.000 0.135 0.000 selections.py:269(broadcast) 6348 0.065 0.000 0.809 0.000 function_base.py:2941(_ureduce) 1000 0.063 0.000 0.063 0.000 section_neuron.py:74(soma_points) 6000 0.061 0.000 0.069 0.000 dataset.py:153(shape) 12696 0.061 0.000 0.133 0.000 numerictypes.py:728(issubdtype) 1 0.059 0.059 29.538 29.538 :1() 3000 0.053 0.000 0.053 0.000 :0(zeros) 33740 0.052 0.000 0.052 0.000 :0(issubclass) 26696 0.046 0.000 0.046 0.000 :0(isinstance) 4000 0.041 0.000 0.120 0.000 selections.py:241(init) 6348 0.039 0.000 0.116 0.000 fromnumeric.py:554(partition) 2000 0.039 0.000 0.438 0.000 dataset.py:553(read_direct) 8348 0.038 0.000 0.343 0.000 fromnumeric.py:2789(mean) 1000 0.035 0.000 29.474 0.029 section_neuron.py:110(loadneuron) 6348 0.035 0.000 0.035 0.000 :0(partition) 2000 0.035 0.000 0.091 0.000 dataset.py:235(init) 2000 0.035 0.000 0.586 0.000 dataset.py:598(array) 12696 0.034 0.000 0.053 0.000 numerictypes.py:660(issubclass) 23348 0.034 0.000 0.034 0.000 :0(len) 8348 0.034 0.000 0.050 0.000 _methods.py:43(_count_reduce_items) 2000 0.031 0.000 0.193 0.000 group.py:145(getitem) 6348 0.030 0.000 0.839 0.000 function_base.py:3001(median) 1000 0.029 0.000 0.914 0.001 hdf5.py:167(_unpack_v1) 2000 0.027 0.000 0.029 0.000 dataset.py:166(dtype) 4000 0.026 0.000 0.058 0.000 selections.py:144(init) 1000 0.026 0.000 7.036 0.007 hdf5.py:70(read) 2000 0.025 0.000 0.038 0.000 group.py:286(contains) 1000 0.023 0.000 0.023 0.000 :0(create) 17000 0.022 0.000 0.022 0.000 base.py:211(id) 2000 0.021 0.000 0.021 0.000 :0(get_create_plist) 4000 0.019 0.000 0.019 0.000 :0(create_simple) 6348 0.018 0.000 0.026 0.000 init.py:398(len) 14000 0.017 0.000 0.017 0.000 selections.py:160(shape) 6348 0.017 0.000 0.017 0.000 numeric.py:1396(rollaxis) 2000 0.017 0.000 0.027 0.000 filters.py:200(get_filters) 6348 0.016 0.000 0.016 0.000 init.py:392(iter) 1000 0.015 0.000 0.177 0.000 files.py:174(init) 6348 0.015 0.000 0.015 0.000 :0(flatten) 4000 0.015 0.000 0.052 0.000 fromnumeric.py:1838(product) 4000 0.014 0.000 0.020 0.000 base.py:91(_e) 7348 0.013 0.000 0.013 0.000 :0(get) 7348 0.013 0.000 0.013 0.000 :0(hasattr) 1000 0.012 0.000 1.861 0.002 neuron.py:120(init) 1000 0.011 0.000 0.055 0.000 neuron.py:125(center) 6000 0.009 0.000 0.009 0.000 selections.py:300() 2000 0.008 0.000 0.008 0.000 :0(empty)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/BlueBrain/NeuroM/issues/301#issuecomment-215724689

BlueBrain / NeuroM

Thoughts on design #301