EticaAI / lexicographi-sine-finibus

Lexicographī sine fīnibus
The Unlicense
0 stars 0 forks source link

New exported format: frictionlessdata Tabular Data Package + Data Package Catalogs #35

Open fititnt opened 2 years ago

fititnt commented 2 years ago

As the tittle says, let's do a minimal viable product

fititnt commented 2 years ago

Humm... Okay. We can create at least one global level "profile": "data-package-catalog" (maybe also another for first sub level) and put as low information as possible to not make the file overly huge, since each individual 'profile': 'tabular-data-package' (the ones which actually describe the fields) are huge with over 200 columns of data.

The simplest case of the top level can be like this

Captura de tela de 2022-04-27 22-06-30

./999999999/0/1603_1.py --codex-de 1603_63_101 --status-quo --ex-librario="cdn" --status-in-datapackage | jq

(click to see output)
{
  "profile": "data-package-catalog",
  "name": "1603",
  "resources": [
    {
      "format": "json",
      "name": "1603_1_1",
      "path": "1603/1/1/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_6",
      "path": "1603/1/6/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_7",
      "path": "1603/1/7/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_51",
      "path": "1603/1/51/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_99",
      "path": "1603/1/99/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_101",
      "path": "1603/1/101/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_2020",
      "path": "1603/1/2020/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_1_8000",
      "path": "1603/1/8000/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_25_1",
      "path": "1603/25/1/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_44_86",
      "path": "1603/44/86/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_44_101",
      "path": "1603/44/101/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_44_111",
      "path": "1603/44/111/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_45_1",
      "path": "1603/45/1/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_45_19",
      "path": "1603/45/19/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_45_31",
      "path": "1603/45/31/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_63_101",
      "path": "1603/63/101/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_64_41",
      "path": "1603/64/41/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_64_604",
      "path": "1603/64/604/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_84_1",
      "path": "1603/84/1/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_99_876",
      "path": "1603/99/876/datapackage.json",
      "profile": "tabular-data-package"
    },
    {
      "format": "json",
      "name": "1603_99_987",
      "path": "1603/99/987/datapackage.json",
      "profile": "tabular-data-package"
    }
  ]
}
fititnt commented 2 years ago

Perfect!

Now we have an MVP, both to run datapackage (at specific focused group of dictionaries AND a "profile": "data-package-catalog").

Its very rudimentar, but the focused ones somewhat validate with like

fititnt@bravo:/workspace/git/EticaAI/multilingual-lexicography-automation/officinam$ frictionless validate 1603/63/101/datapackage.json 
# -----
# valid: 1603_63_101.no1.tm.hxl.csv
# -----
# -----
# valid: 1603_63_101.no11.tm.hxl.csv
# -----
# -----
# valid: 1603_63_101.wikiq.tm.hxl.csv
# -----

However, not surprisely, the global one fails... is complaining that we are refering to datapackages that do not exist on the disk (which is true, we need to rebuild the entire library again)

fititnt@bravo:/workspace/git/EticaAI/multilingual-lexicography-automation/officinam$ frictionless validate datapackage.json 
# -------
# invalid: datapackage.json
# -------
============  ==================================================================================================================
code          message                                                                                                           
============  ==================================================================================================================
scheme-error  The data source could not be successfully loaded: [Errno 2] No such file or directory: '1603/1/1/datapackage.json'
============  ==================================================================================================================