dighl / spreadsheet

Various functions for interaction between LingPy and STARLING
GNU General Public License v2.0
0 stars 0 forks source link

Some xls-cells or columns are not converted #4

Open Alexei-Kassian opened 9 years ago

Alexei-Kassian commented 9 years ago

Here is tsz.xlsx converted in tsz.qlc: https://yadi.sk/d/HVgPZ211c6k3R

The column 'Outgroup Chechen' got missed from *.qlc for unclear reasons.

LinguList commented 9 years ago

Thanks for pointing this out. The reason lies in the XLSX file, not in the algorithm itself, since the cell in which the name says "Outgroup Chechen" actually contains an extra space! You can see that when looking in the table and marking everything. Having removed this space, everything worked normally.

It is important that the files from STARLING output are not changed when converting them, since the spreadsheet app needs some minimal things to be kept as clear and "machine-readable" as possible. Removing spaces automatically is a simple solution that could be added quickly. However, I am a bit reluctant to do this right away, for the simple reason that I am afraid that we also need to point out that users should be careful about the header (first line in XLSX) when using the app, since it heavily relies on it.

So if this file was directly taken from STARLING (I mean the data uploaded to the website of GLD), then it's another question, because if STARLING exports are inconsistent here, this needs to be captured by the APP. However, if it was something introduced later, I would prefer to add documentation in the future (some "manual" or "troubleshooting", but I don't know when I will find time to do it), pointing out that the header line for languages and the like needs to follow the format of "Language nameA" and "Language NameA #" really strictly.

What do you think?

Alexei-Kassian commented 9 years ago

Dear Mattis, yes, you're right. It is due to a parasitic paragraph character at the very end of the cell name 'Outgroup Chechen'.

Formally it's indeed a problem of the specific Starling file. One the other hand, Starling itself is very tolerant to such inaccuracies, so these inaccuracies occur frequently in our data (unfortunately).

The best way would be to teach LingPy Spreadsheet to ignore this problem character. Nevertheless, the current problem is solved. Thanks and sorry -- it was an error on my side.