comphist / cora

A web-based, token-level annotation tool for non-standard language data
http://www.linguistics.rub.de/comphist/resources/cora/
MIT License
10 stars 6 forks source link

CorA database checking/installation #28

Closed mbollmann closed 7 years ago

mbollmann commented 9 years ago

Originally reported by: Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann)


There should be a way to set up the CorA database automatically, aside from manually having to do mysql -uroot -p <coradb.sql.

Additionally, we need a way to handle changes to the database schema between updates.

At the very least, CorA should be able to check if the database schema is correct for the currently running version. Unfortunately, this is not so trivial. Some possibilities:

  1. Call mysqldump --no-data on the database and compare it with the coradb.sql file of the current commit. This is problematic due to a lot of minor differences accumulating; e.g., mysqldump includes auto_increment counters, "DEFAULT NULL" might have been inserted by the DB, the DB could have converted "boolean" columns to "tinyint(1)" internally, etc. pp. ... in practice, this seems unusable.

  2. Extract the database schema by querying the database, then write comparison logic to compare it to the latest schema. This requires the latest schema to be parsed in some way, either by

    • discarding the current coradb.sql file in favor of a JSON-/YAML-/XML-based database description format, which is used for comparison and to automatically generate the coradb.sql file; or

    • using some SQL parser to parse coradb.sql, e.g. the TYPO3 SQL parser (alternative link).

  3. Use an external tool to perform the comparison; e.g. mysqldiff, mysql-diff, SchemaSync, etc.


mbollmann commented 9 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Implemented possibility 4 by commit e7c64a4.

mbollmann commented 9 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Possibility 4:

Database could have a special internal "status" table containing a version string. This could be an MD5 of the coradb.sql file, or a git tag of the commit when coradb.sql last changed, etc.; so we can easily check if the database matches the current version.

Additionally, this would enable "update scripts" that specify how to get from a specific database version to the next.

mbollmann commented 9 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Re 3.: