acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
368 stars 249 forks source link

Schema tightening #420

Open akoehn opened 5 years ago

akoehn commented 5 years ago

the relaxNG schema is our documentation for data sources which also checks validity.

Based on previous discussion, we should define valid values for months (see #94 and the second one where we hat days in the month field) and while at it, pages and doi seem to be low hanging fruits.

What is the regexp we want to accept for month?

mjpost commented 5 years ago

I think we should have a fully-specified <date> field with an ISO 8601 date (e.g., 2019-06 or 2019-06-21) or date range. I would think maybe relaxNG has builtin support for this.

akoehn commented 5 years ago

It supports everything you could want: YYYY-MM, YYYY, MM, YYYY-MM-DD, ...

I didn't think of changing the XML format, but this is certainly doable as well to clarify formats.