BlueBrain / MorphIO

A python and C++ library for reading and writing neuronal morphologies
https://morphio.readthedocs.io
Apache License 2.0
27 stars 22 forks source link

Feature Request: Option to treat subsequent points with different SWC tags as fork points #469

Open Helveg opened 1 year ago

Helveg commented 1 year ago

Consider a morphology with an axon consisting of a single branch, but with 3 different SWC tags: 2 for axon, 6 for axon hillock, and 7 for axon initial segment. With the current MorphIO parser this gets reduced into 1 Section with 1 type, depending on the SWC tag of the first point of the branch.

I'd like to request a new option to split this branch into 3 Sections, wherever there is an SWC tag transition (taking into account the no_duplicates option)

mgeplf commented 1 year ago

I think I noticed something similar just recently; let me create an example file, and we can check that we're on the same page.

Helveg commented 1 year ago

Conceptually a series of SWC samples with the following tags:

1-1-1-6-6-6-7-7-7

gets converted in MorphIO to a single Section with type 1:

|--------1-------|

more equivalent to this SWC:

1-1-1-1-1-1-1-1-1

I'd like an option to parse this as 3 Sections with their respective types:


|--1--|--6--|--7--|
mgeplf commented 1 year ago

Yup, we're on the same page:

contents =('''1 1 0 4 0 3.0 -1
              2 6 0 0 2 0.5 1
              3 7 0 0 3 0.5 2
              4 8 0 0 4 0.5 3
              5 9 0 0 5 0.5 4''')
n = Morphology(contents, "swc")

should have len(n.sections) == 4

I will take a look when I have moment. I have a rewrite of the SWC parser in progress, that covers this case, afaik. I need to finish it though.

mgeplf commented 1 year ago

@Helveg I have implemented a change that should fix this. However, it's in a completely new SWC reader - I'm passing all the tests that I have in NeuroM and nearly all of them in MorphIO (the ones that I'm not, I think we should re-evaluate what is right.

I'd really appreciate it if you could try with your code, because I think: a) it would fix this issue b) you may turn up other issues.

The branch w/ my new reader is: https://github.com/BlueBrain/MorphIO/compare/efficient-swc-build Let me know if you have any luck.

Helveg commented 1 year ago

Hmm, my version is MorphIO-3.3.6.dev54+g57e4e98.d20230811 from the latest commit on the efficient-swc-build branch and I get this error:

>       morpho = morphio.Morphology(os.fspath(file_like))
E       morphio._morphio.RawDataError: 
E       /mnt/c/Users/pwd06/git/arborize/tests/data/morphologies/multitagged.swc:1:error
E       Unable to parse this line

for this SWC file, which looks good:

1 1 0.0 0.0 0.0 2.9000000953674316 -1
2 1 0.0 5.622320175170898 0.0 2.9000000953674316 1
3 6 0.0 5.622320175170898 0.0 0.75 2
4 6 0.0 6.622320175170898 0.0 0.75 3
5 7 0.0 6.622320175170898 0.0 0.3499999940395355 4
6 7 0.0 16.6223201751709 0.0 0.3499999940395355 5
7 8 0.0 16.6223201751709 0.0 0.15000000596046448 6
8 8 0.0 29.222320556640625 0.0 0.15000000596046448 7
9 8 0.0 41.82231903076172 0.0 0.15000000596046448 8
10 8 0.0 54.42232131958008 0.0 0.15000000596046448 9
11 8 0.0 67.02232360839844 0.0 0.15000000596046448 10
12 8 0.0 79.62232208251953 0.0 0.15000000596046448 11
13 8 0.0 92.22232055664062 0.0 0.15000000596046448 12
14 8 0.0 104.82231903076172 0.0 0.15000000596046448 13
15 8 0.0 117.42231750488281 0.0 0.15000000596046448 14
16 8 0.0 130.02232360839844 0.0 0.15000000596046448 15
17 8 0.0 142.622314453125 0.0 0.15000000596046448 16

I'm on WSL2, my files might have weird line endings and other fun cross-platform freakiness

mgeplf commented 12 months ago

Cool, thanks for the example; I will have a look.

mgeplf commented 12 months ago

I tried loading the above snippet, and it seemed to work. I will try some variations on it though. I don't have a WSL2 machine at my disposal, though.

Helveg commented 11 months ago

It's the line endings, I believe. After running dos2unix on the file the parser works. I think Windows line endings are \r\n, maybe you determine the line ending in a platform specific manner?

>>> import morphio
>>> morphio.Morphology("git/arborize/tests/data/morphologies/multitagged.swc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
morphio._morphio.RawDataError:
git/arborize/tests/data/morphologies/multitagged.swc:1:error
Unable to parse this line
(b4py9) robin@LAPTOP-JF6T3PSU:~$ dos2unix git/arborize/tests/data/morphologies/multitagged.swc
dos2unix: converting file git/arborize/tests/data/morphologies/multitagged.swc to Unix format...
>>> import morphio
>>> morphio.Morphology("git/arborize/tests/data/morphologies/multitagged.swc")
<morphio._morphio.Morphology object at 0x7f6ff26c2a70>
mgeplf commented 11 months ago

It's the line endings, I believe. After running dos2unix on the file the parser works. I think Windows line endings are \r\n, maybe you determine the line ending in a platform specific manner?

Yup, thanks for catching that.

Helveg commented 11 months ago

Any status updates? :)

mgeplf commented 11 months ago

Sorry, I was away, and haven't had a single stitch of time to look since I got back. I think it's a quick fix, though, so I can maybe find time today.

mgeplf commented 11 months ago

I have added handling of windows end of line here: https://github.com/BlueBrain/MorphIO/commit/cad7bdd24b33c306078d9a20fb9467566d086332

Could you try it, and see if it fixes this problem?

Helveg commented 11 months ago

Parses into 1 soma and 3 sections! 🎉

mgeplf commented 11 months ago

Parses into 1 soma and 3 sections!

I hope that's what you expected!

Helveg commented 11 months ago

It is, thanks :)

Helveg commented 10 months ago

When would this land in release? It's a blocker for me. If you can sneak it in in the next few days I would not have to fix a whole series of pesky bugs with my own SWC parser ;)

mgeplf commented 10 months ago

I'm super busy with other things at the moment.

There are a couple tests that fail, and I think it might be a semantic decision that needs to be made. For that, I will need to discuss with others what the right decision is.

mgeplf commented 10 months ago

@Helveg you're input would be helpful: do you think the following is a valid (but small) SWC file:

1 1 0 0 0 1 -1
1 3 1 0 0 1  1
Helveg commented 10 months ago

Nitpick: the sample identifiers (first column) need to be unique. I'd say since SWC points from the child to the parent, the inferred segment should have the child's properties all the way up to the parent. The resulting output of:

1 1 0 0 0 1 -1
2 3 1 0 0 1  1

should then be a disc shaped soma segment of length 0, and a dendritic segment of length 1, just like:

1 1 0  0 0 1 -1
2 1 1  0 0 1  1
3 3 2  1 0 1  2
4 3 2 -1 0 1  2

would intuitively create a soma and 2 dendrites, not a Y shaped soma.

Helveg commented 10 months ago

I'd say it's a design choice of MorphIO to allow for 0 length segments or not, but I don't see the harm in allowing it. I'd find it needlessly restrictive for valid edge cases, and would leave these kinds of quality checks to dedicated morphology auditing/repairing tools that one can build on top of MorphIO.

mgeplf commented 10 months ago

Nitpick: the sample identifiers (first column) need to be unique.

Yeah, that was a typo; sorry.

I'd say it's a design choice of MorphIO to allow for 0 length segments or not, but I don't see the harm in allowing it.

Okay, makes sense.

Currently, we're also transforming:

1 1 0 0 0 1 -1
2 3 1 0 0 1  1
3 3 1 1 0 1  2
4 3 1 0 1 1  2

Into:

1 1  0  0  0  1 -1
2 3  1  0  0  1  1
3 3  1  1  0  1  2
4 3  1  0  0  1  1
5 3  1  0  1  1  4

Which seems wrong: ie now the soma has two children instead of one. Thoughts?

Helveg commented 10 months ago

I don't understand? Why add a point?

mgeplf commented 10 months ago

I don't understand? Why add a point?

Yeah, that's my confusion too; I don't in the "new" parser, and that's what I have to work out. Thanks for your input.