cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

return concatenated newick string instead of raising KeyError #166

Closed fmatter closed 1 year ago

fmatter commented 1 year ago

The dict self.trees._parsed_files[self.file.id] has keys "1", "2" etc. The return statement OTOH assumes a key that contains the tree ID, resulting in a KeyError. I've attached a MWE dataset where validation fails because of this.

I assumed that the newick string should contain ;-separated trees if there are multiple.

xrotwang commented 1 year ago

While it's possible to put multiple trees into one Newick file, there's no way to supply names for these. That's why pycldf uses the 1-based index in the list of trees as tree name. Now, the next somewhat confusing issue is that newick trees are looked up in Nexus or Newick files using TreeTable.Name. We don't use TreeTable.ID, because that ID must conform to CLDF ID conventions whereas a tree name in Nexus must conform to Nexus conventions.

So, changing Name in your TreeTable to 1 will make validation pass.

fmatter commented 1 year ago

So a tree must always have the name 1 (or more, if there are ;-delimited trees)?

xrotwang commented 1 year ago

So a tree must always have the name 1 (or more, if there are ;-delimited trees)?

Yes, if the trees are stored in a Newick file. In a NEXUS file they can be given a more meaningful name.