BlueBrain / MorphIO

A python and C++ library for reading and writing neuronal morphologies
https://morphio.readthedocs.io
Apache License 2.0
26 stars 22 forks source link

Should we support less restrictive SWC format? #432

Open adrien-berchet opened 1 year ago

adrien-berchet commented 1 year ago

Some softwares (e.g. https://alleninstitute.org/what-we-do/brain-science/research/products-tools/vaa3d/) might create SWC files with less restrictive specs (e.g. a root ID that is not equal to -1, see https://github.com/Vaa3D/vaa3d_tools/issues/38). I guess this is also true for other file formats (ASC and H5 files). This raises again the question: should MorphIO be only able to load 'realistic' morphologies or should it be able to load any file as is, even if the morphology does not make any sense? As pointed out by @hanchuan in https://github.com/Vaa3D/vaa3d_tools/issues/38, it might make sense to store morphologies in an intermediate but not realistic state. WDYT @lidakanari @eleftherioszisis @mgeplf ?

adrien-berchet commented 1 year ago

Note that if we want to be able to read any morphology, we should be able to load somata with tree structures (i.e. with bifurcations, as described here: https://github.com/BlueBrain/MorphIO/issues/213), which would be a major change.

Helveg commented 1 year ago

Another restriction that I'd like to see removed is the ban on custom SWC tags :)

adrien-berchet commented 1 year ago

Hi @Helveg I am not super familiar with SWC file format, what do you mean by custom SWC tags?

Helveg commented 1 year ago

I currently don't have a reproducer at hand! In the SWC spec it is called the "Structure Identifier", but the error raised by MorphIO goes something like "custom tags not supported", and it blocks you from entering any value above ... 10, 15? While the SWC spec just says that any 5+ value is considered custom, so you already support 10 custom tags, but no higher

adrien-berchet commented 1 year ago

Ah yeah ok I see, it is indeed limited to 10. It seems this limitation comes from the NeuroMorpho specs (as stated here: https://github.com/BlueBrain/MorphIO/blob/master/include/morphio/enums.h#L82). I don't see any reason for this limitation but we want to keep compatibility with NeuroMorpho, so I guess we want to keep this limitation. Nevertheless, it would be possible to add an option to automatically set these tags to 10 (or any other value between 5 and 10, I don't know what's the best, though I think we should avoid 6 and 7 values since they can have special meaning (branch point and end point respectively)), so it would still be possible to load the morphology (with warnings issued of course), though the section types would be messed up. WDYT?

Helveg commented 1 year ago

As long as I can still determine the original number that I gave in the SWC, that should be fine :)

adrien-berchet commented 1 year ago

Unfortunately it would not be possible, all values >10 would be set to 10, so it would not be possible to know if the original value was 11, 12, 13, etc.

mgeplf commented 1 year ago

Wow, this fell off my radar; sorry about that.

e.g. a root ID that is not equal to -1,

It appears there has been a further attempt to standardise SWC better: https://swc-specification.readthedocs.io/en/latest/

Which includes a governance/steering committee (with people from AIBS, INCF, neuromorpho, etc: https://swc-specification.readthedocs.io/en/latest/governance.html).

Currently, they are saying that -1 is the starting point (from https://swc-specification.readthedocs.io/en/latest/swc.html):

Parent defines how points are connected to each other. In a tree, multiple points can have the same ParentID. The first point in the file must have a ParentID equal to -1, which represents the root point.

So I'd prefer to keep this, as I think it makes sense to have at least some amount of uniformity and standard conformance.

WRT custom tags, I don't really have a problem bumping the number to 20 or something, but that feels like a stopgap. The above specification only has 0-7, w/ 5 as custom, and 6 as unspecified neurite.

mgeplf commented 1 year ago

I've added more custom types; up until 19; hopefully that's enough for a while.

Helveg commented 1 year ago

Thanks for the addition! That should probably cover our needs for a while, but is more restrictive than I believe the format specifies, which depends on how you read the document. The table says for column 2:

 Type identifier. A positive integer.

So that's anything [0, +inf[, it then continues to talk about the "The basic set of types used in NeuroMorpho.org":

The basic set of types used in NeuroMorpho.org SWC files are: TypeID | Description — | — 0 | undefined 1 | soma 2 | axon 3 | (basal) dendrite 4 | apical dendrite 5 | custom 6 | unspecified neurite 7 | glia processes

I'm not sure if they mean to say that that should be the specification, or if they literally meant it as just an example of what NeuroMorpho uses.

The document feels kind of rushed and unfinished, for example, the grammar of Nanda et al 2018 says:

integer = [+|-] digit{digit} ;
Type = integer ;

which means type could be negative just as well: ]-inf, +inf[

(PS: my tool encodes all regions with unique parameter sets on the morphology as a separate tag, I now use a homemade SWC parser, but I'd prefer to drop it, but this issue and access to the header would be required for that)

mgeplf commented 1 year ago

Thanks for the addition! That should probably cover our needs for a while, but is more restrictive than I believe the format specifies, which depends on how you read the document.

It is more restrictive, but I also think that supporting many more would isn't in the spirit of having a "type" identifier, when the other examples are soma, axon, dendrite, and basal-dendrite. For morphologies, the universe of neurites won't probably suddenly expand, and people are taking liberties with the format to encode other information, which is fragmenting the "standard" - I'd prefer not to participate in that fragmentation if possible.

The document feels kind of rushed and unfinished

Perhaps, but I think it's more clear than other "standardizations" I've read thus far.

access to the header would be required for that

I will give that some thought; since swc+ is also adding metadata to the header.

mgeplf commented 1 year ago

I have commented on the INCF proposal here: https://www.incf.org/commentaries/swc

Which should hopefully have integrated the feedback from you, @Helveg, about integer types - I suggested that they have a defined type.

Hopefully the other points in that proposal come true; that non-negative ids are not allowed, everything starts at root nodes, the ids are dense.

If it's ok, I'm going to close this ticket for now, but open one about the access to the headers.