CopticScriptorium / corpora

Public repository for Coptic SCRIPTORIUM Corpora Releases
31 stars 13 forks source link

PATHS metadata field names not appearing properly in ANNIS #39

Closed ctschroeder closed 3 years ago

ctschroeder commented 4 years ago

The PATHS metadata fields in ANNIS don't appear properly. The period appears as the character code %2E, so paths.works is paths%2Eworks. Is it possible to fix this? We discussed these field names prior to releasing the new data, and I thought we had concluded that the period in the field name was not prohibited. Can we still use these field names?

I apologize for missing this during the review of the documents in ANNIS before the official release.

amir-zeldes commented 4 years ago

OK, I absolutely knew we will have a problem with these somewhere, but as the XML standard doesn't actually prohibit them, I figured we'd just tolerate this. Now I see I should have been more paranoid: in fact, ANNIS does not allow periods in annotation names (regular or meta).

We could theoretically get into how to change ANNIS to support this, and it would definitely take a while even if we get Thomas on board, but honestly, that's just inviting the next bug with some software that tacitly assume key names don't contain periods. Can we reconsider this decision and replace them with underscores?

ctschroeder commented 3 years ago

this is fixed by changing the field names