NaN in feature section_tortuosity

chalimou commented 8 years ago

the following little script extracts the values for the feature 'section_tortuosity'. The first value in the resulting array is a nan. The problem exists since yesterday. I include the cell (Armando's reconstructed reference biological cell) as an attachment.

import neurom as nm

ref_population = nm.load_neurons('./hippo')
ref_data = nm.get('section_tortuosity', ref_population, neurite_type=nm.AXON)
print ref_data[0]

Output:

nan

hippo.zip

juanchopanza commented 8 years ago

@chalimou Thanks for the working example. I could reproduce the issue immediately. I also checked with older code and it is present there, so the problem exists since well before yesterday (good to know no recent change caused this.) I will look into it.

Note: I took the liberty of further reducing the example.

juanchopanza commented 8 years ago

@chalimou The first section has length 0:

import neurom as nm

pop = nm.load_neurons('./hippo')
seclen = nm.get('section_lengths', pop, neurite_type=nm.AXON)
print seclen[0]

I can see this too by running morph_check on the file, getting the output

ERROR: Has all nonzero section lengths FAIL

In this case, the issue is that point 2, a starting neurite point, bifurcates. NeuroM sections cannot be zero length by definition. Is there any possibility to fix this file?

chalimou commented 8 years ago

thanks for the investigation. I have no idea about files and formats, so l inform armando and lida and let’s talk offline the synthesis meeting tomorrow. Just keep the issue open until I get more information. You are right, the problem existed before, I’ve checked old validation reports and it shows always when this cell is involved. Since tortuocity = radial distance / path distance, that’s where the nan comes from.

nancy

On 28 Sep 2016, at 10:15, Juan Palacios notifications@github.com wrote:

@chalimou https://github.com/chalimou The first section has length 0:

import neurom as nm

pop = nm.load_neurons('./hippo') seclen = nm.get('section_lengths', pop, neurite_type=nm.AXON) print seclen[0] I can see this too by running morph_check on the file, getting the output

ERROR: Has all nonzero section lengths FAIL

In this case, the issue is that point 2, a starting neurite point, bifurcates. NeuroM sections cannot be zero length by definition. Is there any possibility to fix this file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250101635, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZk3WOq7mj86cM0KPdL2h2mAX9Wl23ks5quiI-gaJpZM4KIgjA.

lidakanari commented 8 years ago

According to this issue: https://github.com/BlueBrain/NeuroM/issues/166 we modified other parts of NeuroM (in this case the plotting) to include cells that have this type of trees (directly bifurcates after the first point). There was a long discussion after which it was decided that we will keep those neurons as accepted. Also, if I am not mistaken Dan made some arrangements with the circuit building team to allow those cells to be integrated into the circuit.

Of course that requires a solution for all the undefined behaviors that those trees generate, but we have to be consistent through the code and our decisions: we shouldn't fix some of the problems this behavior causes and ignore the rest of the code that breaks because of them.

juanchopanza commented 8 years ago

@lidakanari thanks for finding that link. That issue is specifically about plotting trees. I don't recall the conclusion of the discussions about zero length sections for neurons are accepted. Do you have a reference to is somewhere?

In any case, these neurons are not currently rejected by NeuroM. I think a NaN is a suitable result for the tortuosity of a zero length section. The user can then decide how to handle it (e.g. remove or ignore those values, or even setting them to a magic number such as 0.)

lidakanari commented 8 years ago

@juanchopanza I don't recall any conclusion concerning the zero length problem. This proposal makes sense (since I cannot think of a more reasonable value to replace the nan in this case). I think it will be useful to document it in the function description, so that the user can handle the nan values accordingly.

chalimou commented 8 years ago

I see your points and understand that from circuit building, NeuroM and plotting point of view this is acceptable. NaNs and zeros completely ruin the statistics though and I do not have any possibility to manually edit the arrays afterwards. We all love automatisation, since it reduces manually caused errors. Proposal: let me think about it and discuss it some time tomorrow, in person, I might have some more questions on it. We can then decide how to handle it in a sensible way that include the statistics into the considerations.

On 28 Sep 2016, at 11:32, lidakanari notifications@github.com wrote:

@juanchopanza https://github.com/juanchopanza I don't recall any conclusion concerning the zero length problem. This proposal makes sense (since I cannot think of a more reasonable value to replace the nan in this case). I think it will be useful to document it in the function description, so that the user can handle the nan values accordingly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250118642, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZk89QEab7ykJUEeRAFScNxTyGLUH1ks5qujRFgaJpZM4KIgjA.

juanchopanza commented 8 years ago

@chalimou The feature getter gives you an array of floating point numbers. You can filter or transform these to suit your needs.

eleftherioszisis commented 8 years ago

@chalimou

Nancy, can you add something like this in morphval's code?

import numpy as np

p = np.array([0, 1, 2, 3, 4, 5], dtype=np.float)

x = p / p   # this is the result from the get function. 0 / 0 is nan in this case

filter_mask = (x != np.nan) & (x != np.inf)
nice_vals = x[filter_mask]

will result to:

array([ 1. , 1., 1.,  1. ,  1., 1.]) # without the nan in the start

This will remove inf or nan values from the results that you get from the getter. By the way I agree that it should be left to the user to filter the values, because if for example two features need to be compared against each other then one needs to create a mask for both features based on the invalid values of one of them.

mgeplf commented 8 years ago

Note: the above changes the length of the array. To keep the same length, I recommend either using: http://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

or

foo[np.isnan(foo)] = 0

chalimou commented 8 years ago

thank you all very much for the suggestions. I do not want to blindly remove NaNs or Infs since in other circumstances it can be a bug and not a feature like it seems to me it is in this case. But of course one can use sth like this in the case of the tortuosity. I just want to think a bit over the effects on radial distances, angles, etc. And I was thinking of things like the one MIke just mentioned. That is why I was not very eager to close the subject fast.

On 28 Sep 2016, at 13:43, Eleftherios Zisis notifications@github.com wrote:

@chalimou https://github.com/chalimou Nancy, can you add something like this in morphval's code?

import numpy as np

p = np.array([0, 1, 2, 3, 4, 5], dtype=np.float)

x = p / p # this is the result from the get function. 0 / 0 is nan in this case

filter_mask = (x != np.nan) & (x != np.inf) nice_vals = x[filter_mask] will result to:

array([ 1. , 0.5, 0.33333333, 0.25 , 0.2, 0.16666667]) # without the nan in the start This will remove inf or nan values from the results that you get from the getter. By the way I agree that it should be left to the user to filter the values, because if for example two features need to be compared against each other then one needs to create a mask for both features based on the invalid values of one of them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250143774, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZk19nKUq12KkLBHe3KOhihVqUUmFsks5qulLUgaJpZM4KIgjA.

eleftherioszisis commented 8 years ago

@mgeplf

It indeed changes the length of the array. The reason I suggested it is because morphval compares distributions and not pairwise values. By setting nans to zero or other value will lead to artificial peaks in the distribution which will affect the validation.

However, a lovely warning for morphval (or neurom if you think that it would be more suitable) in case of nans could be useful in order to make sure that you are not filtering out values that are generated from bugs.

chalimou commented 8 years ago

Keeping sections with zero length coming out of the soma into the analysis has additional problems: section_length = section_radial_distances = section_path_distances = 0, number_of_sections is incorrect and of course tortuosity = nan. The clean solution would be to remove those sections altogether. The dirty solution is to set section_tortuosity = 1 for these cases. Since those sections are not many (I hope) the error in the statistics would not be fatal. Can one of the two solutions be implemented in NeuroM?

On 28 Sep 2016, at 17:47, Eleftherios Zisis notifications@github.com wrote:

@mgeplf https://github.com/mgeplf It indeed changes the length of the array. The reason I suggested it is because morphval compares distributions and not pairwise values. By setting nans to zero or other value will lead to artificial peaks in the distribution which will affect the validation.

However, a lovely warning for morphval (or neurom if you think that it would be more suitable) in case of nans could be useful in order to make sure that you are not filtering out values that are triggered from bugs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250208209, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZkxY7niPYEPJsuRxEgzPVlK0HRh8Fks5quowDgaJpZM4KIgjA.

juanchopanza commented 8 years ago

@chalimou I could get NeuroM to reject these files, but apparently we decided it would be a good idea to allow them. And setting the value to 1 is the kind of domain specific fix that should be performed at the caller side. Alternatively, the producers of the files could look at them and fix them so this kind of issue doesn't arise.

chalimou commented 8 years ago

From what I understood we keep the files with good reason and we can stick to that. Rejecting them will leave Armando with no cell at all and we do not want that :) For the naive and even semi-naive user operating directly on the files is impossible. Even more difficult is sometimes to get the file producers to do the fix. Setting section_tortuosity=1 for the cases we are talking about is consistent with the numbers associated to zero length sections just off the soma. The smallest value the section tortuosity can have is 1 and it happens when path distance = radial distance (here both 0 which causes the problems).
I can only assume what a domain specific fix is, since I’m not a computer scientist. I think for these sections there should be a sensible value assigned to the tortuosity by NeuroM centrally and not only by me for the validation reporting. Of course I can do it for the reporting but the next person analysing tortuosity will again wonder where the nan comes from and where the bug is. And maybe they will not be as lucky as I am to have a computer scientist on the side to search for the cause. If there is no time to do either (remove the sections themselves) or (set toruosity = 1) just leave the issue open for your successor, so that we do not forget it is there, it is fine with me.

On 29 Sep 2016, at 14:47, Juan Palacios notifications@github.com wrote:

@chalimou https://github.com/chalimou I could get NeuroM to reject these files, but apparently we decided it would be a good idea to allow them. And setting the value to 1 is the kind of domain specific fix that should be performed at the caller side. Alternatively, the producers of the files could look at them and fix them so this kind of issue doesn't arise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250455957, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZk6c-GPxSj8D7SMAhjWdEx8KZ1llCks5qu7N2gaJpZM4KIgjA.

juanchopanza commented 8 years ago

@chalimou Maybe you, @lidakanari and other morphology experts can discuss whether this value of 1 makes sense for a zero length section in all cases. If you all agree what it makes sense, we can add it as a special case value to NeuroM. I don't recall seeing anything about this in the references Lida circulated about the tortuosity. And for me, as a non-morphology expert, it is difficult to understand why the minimum value (1) makes sense here, as opposed to, say, infinity. To me it seems we're asking a question that can't be really be answered!

chalimou commented 8 years ago

I can not generalise for all cases, with zero length sections so it’s better to check, you are right.

On 29 Sep 2016, at 17:18, Juan Palacios notifications@github.com wrote:

@chalimou https://github.com/chalimou Maybe you, @lidakanari https://github.com/lidakanari and other morphology experts can discuss whether this value of 1 makes sense for a zero length section in all cases. If you all agree what it makes sense, we can add it as a special case value to NeuroM. I don't recall seeing anything about this in the references Lida circulated about the tortuosity. And for me, as a non-morphology expert, it is difficult to understand why the minimum value (1) makes sense here, as opposed to, say, infinity. To me it seems we're asking a question that can't be really be answered!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueBrain/NeuroM/issues/503#issuecomment-250497586, or mute the thread https://github.com/notifications/unsubscribe-auth/APxZk9mv4Tc-Z7sQ-plg2eIfEdJCjtdiks5qu9begaJpZM4KIgjA.

juanchopanza commented 8 years ago

OK, let's leave this open until we figure this out.

chalimou commented 8 years ago

After discussion with @lidakanari, we agree that as @mgeplf said it doesn't make sense to remove the value if it is nan as this will result in inconsistencies of the length of the arrays. We thought about it also for zero length sections in other positions and we see two viable solutions:

1) The correct approach is to remove the section that only contains one point. This will resolve all the issues (and propagate across all measurements) without changing anything else in the code. 2) Replace the nan value in tortuosity by 1, treating it as a "straight" section. It does not have biological relevance but is consistent with other measurements for this section. Also this means that we might have to think how to deal with the rest of the features that are affected.

juanchopanza commented 8 years ago

@chalimou Solution 1) is not easy to implement, and it could change things such as the number of neurites in the cell. Also, it would have to be done to the date, because one of the design principles of neurom is to not change the input data. Solution 2 can be easily implemented for tortuosity. And yes, it is a good idea to think of the consequences of allowing zero length sections!

juanchopanza commented 8 years ago

Actually, even the fix for 2) isn't ideal. The section end-to-end distance being zero (the denominator in the tortuosity calculation) does not mean the section length is zero. A long section could loop back on itself. Does assigning the value of 1 still make sense, and are we sure that won't confuse people in the future?

Or we could restrict this special case to sections with less than two points. This does not guarantee that there won't be NaNs in the future, but it covers this particular case, which I guess is the most common one.

chalimou commented 8 years ago

@lidakanari and me agreed, let's restrict tortuosity = 1 only for this special case, that is, sections with less that two points. You are right, there is no guarantee for no NaNs in the future. We'll see what we do then, depending on the case. The example you give might indeed happen (@lidakanari is not aware of sth like this yet) and tortuosity should not be 1 for this, that is for sure.

juanchopanza commented 8 years ago

OK, thanks. Then I will merge #519 and this particular case will be solved.

chalimou commented 8 years ago

@juanchopanza just for the case you described above, the one with the section end-to-end distance being zero (the denominator in the tortuosity calculation) but non zero section length. It is a circle then. From mathematical, common sense and wikipedia point (I've just looked tortuosity up :) of view we can set this case to section_tortuosity = INF if @lidakanari agrees and it is not too much trouble for you. Then we would have covered all cases we thought of at the moment.

juanchopanza commented 8 years ago

@chalimou I will make a new issue for that.

BlueBrain / NeuroM

NaN in feature section_tortuosity #503