Closed marctang closed 4 years ago
I just recently added a bit more support for feeding clld apps from CLDF data. There's now a command clld create
$ clld create -h
usage: clld create [-h] [-f] [--quiet] outdir [variables [variables ...]]
Create the skeleton for a clld app project.
Variables:
- directory_name: The name of the project directory. This will also be used as name
of the python package.
- cldf_module: If the app data is initialized from a CLDF dataset, specify the CLDF
module this dataset conforms to (Wordlist|StructureDataset|Dictionary|Generic).
Leave empty otherwise.
Note that this requires passing an `--cldf` option to `clld initdb`.
- mpg: Specify "y" if the app is served from MPG servers, and thus needs to fulfill
certain legal obligations (n|y).
positional arguments:
outdir Output directory. The last path segment will be used as default
value for the'directory_name' variable.
variables If run non-interactively, defaults for the template variables
can be passed inas 'key=value'-formatted arguments (default:
None)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite an existing project directory (default: False)
--quiet Run non-interactively, i.e. do not prompt for template variable
input. (default: False)
which will create the skeleton for a clld app. To load the data into the db you'd run clld initdb
$ clld initdb -h
usage: clld initdb [-h] [--prime-cache-only] [--cldf CLDF]
[--concepticon CONCEPTICON]
[--concepticon-version CONCEPTICON_VERSION]
[--glottolog GLOTTOLOG]
[--glottolog-version GLOTTOLOG_VERSION]
config-uri
positional arguments:
config-uri ini file providing app config
optional arguments:
-h, --help show this help message and exit
--prime-cache-only
--cldf CLDF
--concepticon CONCEPTICON
Path to repository clone of Concepticon data (default:
None)
--concepticon-version CONCEPTICON_VERSION
Version of Concepticon data to checkout (default:
None)
--glottolog GLOTTOLOG
Path to repository clone of Glottolog data (default:
None)
--glottolog-version GLOTTOLOG_VERSION
Version of Glottolog data to checkout (default: None)
For simpler datasets - perhaps your classifier data - an online app could also be deployed via datasette-cldf - which would make hosting somewhat simpler because there's no need for a database server.
Thanks for the suggestions! I had a look at datasette. Since the database will have quite a lot of features to be added (WALS style), we would prefer to use the clld structure from the start, which will make the expansion easier later on.
A few quick confirmation-questions :-P, if I take tangclassifiers as an example:
another comment on the side that I said already to quite a few of your colleagues already: the structure of the cldf and clld are really nice and clean! Super thanks to making it available and nicely structured plus supporting the deployment of data on it :-)
another comment on the side that I said already to quite a few of your colleagues already: the structure of the cldf and clld are really nice and clean! Super thanks to making it available and nicely structured plus supporting the deployment of data on it :-)
Nice to hear that. We need more people to spread the word, so we can teach each other how to apply it. That's why it is also important for us to help those interested in these initial conversions: so they can then teach others.
Well noted :-)! will do my best to spread the word and the method
@marctang regarding the clld initdb
call: the cldf
option should point to the metadata file, i.e. cldf/StructureDataset-metadata.json
.
@xrotwang Thanks for the reply :-)
So, I opened a virtual environment and used 'clld create tangclassifiers' to create the skeleton, then from clld/tangclassifiers I tried the following:
clld initdb development.ini --cldf ~/Desktop/GitHub/tangclassifiers/cldf/StructureDataset-metadata.json
which gave the following, telling me that the path to glottolog is required.
INFO dropping sqlite:/db.sqlite
INFO creating sqlite:/db.sqlite
Traceback (most recent call last):
File "/home/marctang/.venv/bin/clld", line 11, in <module>
load_entry_point('clld', 'console_scripts', 'clld')()
File "/home/marctang/clld/src/clld/__main__.py", line 26, in main
return args.main(args) or 0
File "/home/marctang/clld/src/clld/commands/initdb.py", line 66, in run
args.initializedb.main(args)
File "/home/marctang/clld/tangclassifiers/tangclassifiers/scripts/initializedb.py", line 29, in main
assert args.glottolog, 'The --glottolog option is required!'
AssertionError: The --glottolog option is required!
So, I tried cloned glottlog's GitHub repository https://github.com/glottolog/glottolog and tried the following:
clld initdb development.ini --cldf ~/Desktop/GitHub/tangclassifiers/cldf/StructureDataset-metadata.json --glottolog ~/Desktop/GitHub/glottolog/
Which gives the following error. I guess I made something wrong in the settings or forgot to do something, e.g., should I clone the repository from Zenodo instead of GitHub or change something in the py files? Could you point me in the right direction? Thanks!
INFO dropping sqlite:/db.sqlite
INFO creating sqlite:/db.sqlite
Traceback (most recent call last):
File "/home/marctang/.venv/bin/clld", line 11, in <module>
load_entry_point('clld', 'console_scripts', 'clld')()
File "/home/marctang/clld/src/clld/__main__.py", line 26, in main
return args.main(args) or 0
File "/home/marctang/clld/src/clld/commands/initdb.py", line 66, in run
args.initializedb.main(args)
File "/home/marctang/clld/tangclassifiers/tangclassifiers/scripts/initializedb.py", line 84, in main
key=lambda v: (v['parameterReference'], v['id'])),
File "/home/marctang/clld/tangclassifiers/tangclassifiers/scripts/initializedb.py", line 20, in iteritems
cmap = {cldf[t, col].name: col for col in cols}
File "/home/marctang/clld/tangclassifiers/tangclassifiers/scripts/initializedb.py", line 20, in <dictcomp>
cmap = {cldf[t, col].name: col for col in cols}
File "/home/marctang/.venv/lib/python3.6/site-packages/pycldf/dataset.py", line 565, in __getitem__
raise KeyError(table)
KeyError: 'CodeTable'
Ah, ok. The problem is the lines in scripts/initializedb.py
, created from this template code
https://github.com/clld/clld/blob/03e465c00bfddf1ac3e363d9db2e44609debc116/src/clld/project_template/%7B%7Bcookiecutter.directory_name%7D%7D/%7B%7Bcookiecutter.directory_name%7D%7D/scripts/initializedb.py#L111-L128
It assumes a StructureDataset with a Codes component. There's two ways around this:
CodeTable
to the CLDF dataset - in this case a yes
and a no
row for each parameter.initializedb.py
.I think - while seemingly overkill - the first option makes more sense - and is more transparent. It also allows WALS-like display of feature values on a map with distinctly colored dots, etc.
Of course, the function in initializedb.py
could figure out if there is a CodeTable
, before trying to read it - but that's python code created from a template - and such code is a bit difficult to write, test and debug, so should better be simple. Since there typically is no way around customizing initializedb.py
at some point, I thought leaving the default simple - but sometimes, as here, dysfunctional, was an acceptable decision. What do you think? Too much of a turn-off?
@marctang I'll put together a PR tomorrow, adding a CodeTable
here. Then import into the app should work.
@xrotwang Thanks again for the explanation :-) I agree that adding the CodeTable would be more transparent and better for futur development too. Let me know if there is anything I can help with. I could potentially add that table with R, but it is probably better if everything is done with the same code based on your PR tomorrow. Thanks again for your help!
@marctang just added a CodeTable
(see https://github.com/cldf-datasets/tangclassifiers/commit/de6e87bda980ee77423d87723f9fba376c6f01ec#diff-d9858e8d7e38dec098e24d087bd3c536). With this output, clld initdb
works (on my end). Will put together a recipe in the CLDF cookbook how to do that.
@marctang regarding CLDF an R: @SimonGreenhill has an R library for reading CLDF. Fleshing this out to have the full scope of functionalities of pycldf
might be cool. But to be honest, I'd prefer even R users learn the bit of python necessary to read and write files, since for these things python seems to be the more mature language (e.g. regarding Unicode, paths on different OSs, etc.).
@xrotwang Super thanks! Well noted for the R library. I will do what you suggest and get familiar with both Python and R for this. Thanks also for adding the CodeTable, like this I'll be able to add it for other datasets in the future too :-) I just tested clld initdb
with pserve --reload development.ini
to test it and it works! Yay!
Two more questions though @@. Sorry again for being annoying, I'll pay back by doing my best to teach other people how to do this so that you don't get swarmed by the same questions.
1) for CLDF: the feature values seem to be duplicated. For both features (sortal classifier and morphosyntactic plural), the value became the one of sortal classifiers, e.g., in the original data, Ainu and Abun are sortalclassifiers = yes and morphosyntacticplural = no, but in values.csv they are sortalclassifiers = yes and morphosyntacticplural = yes. When I checked the details in values.csv, it seems that the morphosyntactic plural feature has exactly the same values as the sortal classifier feature.
2) for CLLD: in the html deployed locally, everything works, except the source part, in which the references do not show up. Do you have any suggestions as for where I should look at ? for now what I did is clone from git clone https://github.com/clld/clld.git
, and follow the tutorial steps to create the skeleton and fill it with clld initdb
with pserve --reload development.ini
. I did not modify the other files in the repository.
and when I click on an individual one, I get:
@marctang re CLDF: Good catch! Will check and add some consistency checking (that's what test.py
is for, which is particularly important for larger datasets where eye-balling isn't an option anymore).
@marctang re sources in clld app: See https://github.com/clld/clld/issues/212
@marctang At https://github.com/cldf-datasets/tangclassifiers/commit/5a7387860bb17a803bc372bc236d1d7cdbf41729#diff-b284a28710cce90d9d9be3a7f4cabc8e you can see an example how you'd do some consistency checking for the CLDF data. This is mainly to prevent regressions introduced by fiddling with the code in cmd_makecldf
.
https://github.com/cldf-datasets/tangclassifiers/commit/5a7387860bb17a803bc372bc236d1d7cdbf41729#diff-354f30a63fb0907d4ad57269548329e3 also hooks these tests up with travis, i.e. whenever a change is pushed to the repository, the committer will get an email notification if tests no longer pass.
Thanks! The output is correct now! Well noted also for the examples of checking the data. For the sources in clld, I replied in the other issue you opened, so I'll close the current issue.
@marctang At 5a73878#diff-b284a28710cce90d9d9be3a7f4cabc8e you can see an example how you'd do some consistency checking for the CLDF data. This is mainly to prevent regressions introduced by fiddling with the code in
cmd_makecldf
.5a73878#diff-354f30a63fb0907d4ad57269548329e3 also hooks these tests up with travis, i.e. whenever a change is pushed to the repository, the committer will get an email notification if tests no longer pass.
@marctang If you find the time, you could review https://github.com/cldf/cookbook/blob/master/recipes/clld/README.md - which should be rather similar to what you did to get started with clld.
Awesome! That's indeed quite similar and even faster in a way since it is directly from the CLDF online data! Thanks!
If I may use the opportunity to ask for more advice :-P, here it is: I am also planning to deploy some other databases with the clld framework. Now, thanks to you guys, I will be able to format myself first these databases according to the clfd format and add them in the cldf-datasets.
For using clld, I followed the explanation online and I can get the default example to work. However, I am less sure about how I could, for example, feed the tangclassifiers cldf dataset to it and deploy it online with clld? https://clld.readthedocs.io/en/latest/tutorial.html#populating-the-database Sorry in advance if it is a stupid question @@