Lina - run the categorisation script

As explained in the main "interface" script, the categorisation algorithm to define the search categories can be run separately from the rest of the steps.

Clone the data-curation repository in your local area with

git clone git@github.com:cernopendata/data-curation.git

then go to the directory where the scripts for metadata processing are located:

cd data-curation/cms-YYYY-simulated-datasets

and run the script: to run it on the list of datasets that we now prepare for the release, do

python3 ./code/interface.py --print-categorisation ./inputs/CMS-2015-mc-datasets.txt > categorisation-2015.md

This produces a list in markdown format. This is the same format that we use for pages in the getting-started guide. You can read more on it at https://www.markdownguide.org/basic-syntax/

You can view how it renders using VS Code. Start VS Code with

code .

Open the newly created categorisation-2015.md file and open the preview by (1) right-clicking on the editor tab and (2) selecting "Open preview" from the drop-down menu

Note that these are the search categories that are displayed on the open data portal search menu and they are necessary for the open data users to find their dataset of interest.

cms-dpoa / cms-dpoa-getting-started

Lina - run the categorisation script #53