Node labeling affects persistence diagram outcome

Hey there,

I've been working on morphology package in Python3 which features your computation of persistence diagrams. While I was trying to match your results with mine I realized that there is a "bug" in your code. I created a simple test swc file test_neuron.swc which looks like this Then I switch nodes 3 and 10, to get a tree that looks the same but has slightly different node labeling: Now, when I compute the persistence diagram (i.e. using the projection) I get different outputs for both trees (first: version1, second: version2): Note the different death-time for the 3rd entry (index 2).

Looking into your code I saw that you make implicit assumptions about the order of nodes and their labels. From my own experience that can create weird bugs if the data does not adhere to these standards (as the one above). Maybe you could rewrite your load_neuron function to enforce the node labeling that the rest of your code needs? Another idea would be to make the assumptions on the input data explicit somewhere, so any user is aware of these.

The two swc files I used are:

#test_neuron.swc
1 1 0.0 0.0 0.0 0.5 -1
2 3 0.0 1.0 0.0 0.5 1
3 3 0.0 2.0 0.0 0.5 2
4 3 0.0 3.0 0.0 0.5 3
5 3 0.0 4.0 0.0 0.5 4
6 3 0.0 3.2 0.5 0.5 4
7 3 0.0 2.5 0.5 0.5 3
8 3 0.0 1.5 1.0 0.5 2
9 3 0.0 2.5 1.3 0.5 8
10 3 0.0 1.5 2.0 0.5 8

and

#test_neuron_v2.swc
1 1 0.0 0.0 0.0 0.5 -1
2 3 0.0 1.0 0.0 0.5 1
3 3 0.0 1.5 2.0 0.5 8
4 3 0.0 3.0 0.0 0.5 10
5 3 0.0 4.0 0.0 0.5 4
6 3 0.0 3.2 0.5 0.5 4
7 3 0.0 2.5 0.5 0.5 10
8 3 0.0 1.5 1.0 0.5 2
9 3 0.0 2.5 1.3 0.5 8
10 3 0.0 2.0 0.0 0.5 2

The code for computing the diagrams was

import tmd

files = ['test_neuron.swc', 'test_neuron_v2.swc']
# define filter functions
features = ['radial_distances', 'path_distances', 'projection', 'section_branch_orders']

for file in files:
    n = tmd.io.load_neuron("." + file)

    for f in features:
        ph = tmd.methods.get_ph_neuron(n, feature=f)
        tmd.methods.write_ph(ph, ".%s_%s.txt"% (file.split(".")[0], f))

Hello, thanks a lot for your message. Indeed, you are correct, this is an assumption according to the "standard swc": http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html.

Parent samples should appear before any child samples.

According to this assumption the second file is not "standard" (see for example node 3 has as parent node 8), and therefore it is not loaded as expected.

I am aware however, that often data do not follow the correct specifications. I would be glad to modify the code to add a per-processing step to fix this issue, this step should be at the loading of the neuron. However, we are planning to integrate a proper IO library that will resolve this kind of issues (https://github.com/BlueBrain/MorphIO). If you have data that are in a non standard format I would suggest as a quick fix to use MorphIO to save the data in the "expected" format.

Example code:

import morphio
m = morphio.mut.Morphology('test2.swc')
m.write('test3.swc')

Result:

1           1     0.000000     0.000000     0.000000     0.500000          -1
2           3     0.000000     1.000000     0.000000     0.500000           1
3           3     0.000000     2.000000     0.000000     0.500000           2
4           3     0.000000     3.000000     0.000000     0.500000           3
5           3     0.000000     4.000000     0.000000     0.500000           4
6           3     0.000000     3.200000     0.500000     0.500000           4
7           3     0.000000     2.500000     0.500000     0.500000           3
8           3     0.000000     1.000000     0.000000     0.500000           1
9           3     0.000000     1.500000     1.000000     0.500000           8
10           3     0.000000     2.500000     1.300000     0.500000           9
11           3     0.000000     1.500000     2.000000     0.500000           9

BlueBrain / TMD

Node labeling affects persistence diagram outcome #26