dlce-eva / python-nexus

python-nexus - Generic nexus (.nex, .trees) reader/writer for python
BSD 2-Clause "Simplified" License
11 stars 3 forks source link

Round tripping a nexus with a characters block leads to two character blocks #26

Closed SimonGreenhill closed 2 years ago

SimonGreenhill commented 2 years ago

...which means the resulting nexus can't be read. Example here.

...which means:

>>> from nexus import NexusReader
>>> NexusReader("walker_and_riberio2015/cldf/data.nex")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/simon/.pyenv/versions/3.9.6/lib/python3.9/site-packages/nexus/reader.py", line 42, in __init__
    self._set_blocks(NexusReader._blocks_from_file(filename))
  File "/Users/simon/.pyenv/versions/3.9.6/lib/python3.9/site-packages/nexus/reader.py", line 77, in _set_blocks
    raise NexusFormatException("Duplicate Block %s" % block)
nexus.exceptions.NexusFormatException: Duplicate Block characters

This is because of this:

https://github.com/dlce-eva/python-nexus/blob/071aaca665cc9bfde15201a31c836c6587d04483/src/nexus/reader.py#L80-L81

(which, I admit, I implemented as cheap way to always have a data block rather than looking for data or characters).

What's the best solution here @xrotwang ?

  1. remove these two lines so this change doesn't happen (simplest solution, but means we need to keep a closer eye in e.g. phlorest, and add standardisation there (nex.blocks['data'] = nex.blocks['characters'], del(nex.blocks['characters'])).
  2. delete blocks['characters'] once it's been copied to blocks['data']. This keeps the status quo but adds a magic invisible change so seems like a bad idea.
  3. fix the handlers to create a data and a characters block. Round-tripping will duplicate data and characters but they'll be named differently at least.
  4. revise NexusWriter to check for duplicate blocks, and perhaps allow writing of specific blocks only (nex.write(filename, blocks=['data'])

The fact that the moved block still thinks of itself as a characters block rather than a data block is a bug.

SimonGreenhill commented 2 years ago

So, I think a combination of 1 and 3 and 4 is probably the best?

xrotwang commented 2 years ago

Hm. I think, I'll start with a failing test

def test_roundtrip(examples):
    nex = NexusReader(examples.joinpath('example2.nex'))
    _ = NexusReader.from_string(nex.write())

:)

xrotwang commented 2 years ago

@SimonGreenhill please have a look at #27. This seems to be the smallest change to make this work. It doesn't touch the bug, where the 'data' block is a CharacterHandler. But I don't think we should fix this. We don't demand (or enforce) much of anything regarding whether NexusReaders can be used to modify data - e.g. whether it's ok to assign to NexusReader.blocks after object initialization. So there is no real principle we could claim to follow for such a change.

xrotwang commented 2 years ago

It's not the worst thing that python-nexus gets some exercise now with phlorest :)