We recommend using your terminal to carry out the following instructions. We (strongly) advise you to install a virtual environment. In addition, you need GIT and Python (version 3) on your system.
The NoRaRe database uses a Python package called pynorare
for data curation (as a dependency pyconcepticon
will be installed automatically).
The Python package can be installed from PyPi.
$ pip install pynorare
To access the NoRaRe data sets, you need to clone the norare-data
GIT repository into a folder of your choice. Navigate or create a folder where you want to store the norare-data
repository.
$ mkdir PATH/TO/NAME-NEW-FOLDER
or
$ cd PATH/TO/FOLDER
Clone the repository by typing:
$ git clone https://github.com/concepticon/norare-data.git
The NoRaRe database is linked to Concepticon. To access all data sets and perform the mapping of your own data, you need to download the concepticon-data
GIT repository.
$ git clone https://github.com/concepticon/concepticon-data.git
To check if the installation worked and see the available commands of the pynorare
package, type:
$ norare
For example, you can get statistic of the distribution of the Concepticon identifiers by typing the following command. Note that you should add the path to a specific clone of concepticon-data, because you are not within the root of the concepticon-data. The same is possible for the norare-data repository if you are working with multiple clones. You can also use the catalog.ini
file to specify the path (see 'Define repository paths' section).
$ norare --repos=/PATH/TO/concepticon-data --norarepo=/PATH/TO/norare-data stats
To see all available data sets, navigate into the norare-data
folder and type:
$ norare --repos=/PATH/TO/concepticon-data ls
By typing the following command, the original data set will be downloaded into the raw
folder. It is only locally stored.
$ norare --repos=/PATH/TO/concepticon-data download Abdaoui-2017-EmoLex
To make your live easier, you can define the default path to concepticon-data in a catalog.ini
file (Linux users can follow the desciption in this blog post).
For Mac users: Open the catalog.ini
file in a text editor of your choice and add the following lines.
[clones]
concepticon = /PATH/TO/concepticon-data
If you can't find the catalog.ini
file, create a directory mkdir /Users/YOURNAME/Library/Application\ Support/cldf/
and add it to the cldf
folder.
"Mapping a dataset" refers to the process of mapping the words or concepts described in a dataset to Concepticon's concept sets and extracting associated norms, ratings or relations from a dataset's "raw" data.
This is facilitated with small Python modules (one per dataset), located at datasets/<dataset-ID>/norare.py
. These
python modules may provide two functions, download
and map
with signatures as shown below, which will be called
when the respective command is run with norare
.
def download(dataset: pynorare.api.Dataset):
"""use methods of `dataset` to download or manipulate data"""
def map(dataset: pynorare.api.Dataset, concepticon: pyconcepticon.Concepticon, mappings: dict):
"""eventually call `dataset.extract_data` to write the extracted data to files."""
The metadata file is meant to provide additional information to your data set. It is also used to define the column names.
For a comprehensive description of the possible namespace terms see the CSVW documentation.
You can also follow the standards of the metadata.json files for other data sets. Note that you can specify the old
"titles"
and new column names with "name"
as follows:
{
"name": "ENGLISH",
"datatype": "string",
"titles": "word"
}
Indicate the data "datatype"
of each column: integer
, float
, string
.
Don't forget to add the Concepticon columns:
{
"name": "CONCEPTICON_ID",
"datatype": "integer"
},
{
"name": "CONCEPTICON_GLOSS",
"datatype": "string"
}
Make sure that you have stored the norare.py
and <YOUR-DATASET-ID>.tsv-metadata.json
files and created a raw
folder in your data set folder. If you are ready, navigate to the
norare-data
folder and type the following commands into your terminal:
$ norare download YOUR-DATASET-ID
$ norare map YOUR-DATASET-ID
The raw file should be stored in the raw
folder and a new <YOUR-DATASET-ID>.tsv
should occur in your data set folder.
You can validate your data set by typing:
$ norare validate YOUR-DATASET-ID
Tjuka, Annika, Robert Forkel, and Johann-Mattis List. 2022. Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties. Behavior Research Methods 54. 864–884. https://doi.org/10.3758/s13428-021-01650-1.
Tjuka, Annika. “Adding Data Sets to NoRaRe: A Guide for Beginners,” in Computer-Assisted Language Comparison in Practice, 11/08/2021, https://calc.hypotheses.org/2890.
Tjuka, Annika. “Comparing NoRaRe Data Sets: Calculation of Correlations and Creation of Plots in R,” in Computer-Assisted Language Comparison in Practice, 24/11/2021, https://calc.hypotheses.org/3109.