Closed FredericBlum closed 1 year ago
The following languages have a restrictive license:
I think this would be the job of pydoreco
. So what I'm imagining is a cldfbench enabled repos containing a list of relevant corpora - e.g. in etc/corpora.csv
- and for your study you could
etc/corpora.csv
, adding the restricted corporacldfbench makecldf
locally and do your analyses with the resulting CLDF.Do I understand the code frm #10 that the data is never uploaded to github, so we could just remove the if-clause about the ND-license? As you suggested in the previous answer, cldfbench makecldf
will have to be run locally anyway (also due to file size of raw- and cldf-tables). But we could still have the metadata and the code ready, without having to include two different setups for the different licenses.
No, I would want to upload the CLDF data to github - but this will likely require zipping a couple of files. Once this has been implemented/documented, we should only add the unzipped paths to gitignore. So, no, we couldn't include the ND annotations in this scenario. But as you say, there could be some sort of switch, allowing to run the CLDF creation locally including all annotations.
Could this just be an if-clause within the lexibank script that can be switched on with some variable?
If there is a concrete way how I can support this, I'll happily do that.
The problem is that the call interface for the makecldf
command is controlled by cldfbench
. So, the next best thing may be an environment setting, i.e. calling
export DORECO_FULL=1
cldfbench makecldf ...
and checking in cldfbench_doreco.py
import os
...
if os.environ.get('DORECO_FULL') == '1':
...
We could also set the variable directly in the cldfbench script, right? We did that for the dictionary- and wordlist conversion in other repositories. Exactly the same, but we wouldn't have to add the switch in the environment setting every time we run the cldfbench.
You mean interactively, i.e. prompting the user for input
? Yes, that's an option, too. May be a case for https://github.com/clld/clldutils/blob/c7293255d516995d06fae07124f7d81731ace815/src/clldutils/clilib.py#L177
I am thinking what might be the best way to add the annotations that are restricted by the ND license. @xrotwang Is there an easy way to create a mk-file that downloads the respective files and converts them into CLDF, once the cldfbench-workflow is done? This will probably be more relevant for my study than for this CLDF dataset, as we cannot publish this data as CLDF.