Closed fititnt closed 2 years ago
The EticaAI-Data_HXL-Data-Science-file-formats_Tab
already have an draft of an table that could be used to make an Expert system without the need of full machine learning models.
But for this implementation, I think that we can simply implement both the more specific prefixes, like the +vt_orange_
, and and maybe some special more generic attributes to be used with #3, like the one to mention the "class" (both Orange and Weka use class).
Ok. Interesting. Here the Orange 'Simplified header' specification
While not ideal, the HXLated output without text headers actually are pretty similar to what orange would expect. The biggest difference is that everything after the # the orange consider as textual header, but before this is possible to add a few extra short variables.
hxl2tab https://docs.google.com/spreadsheets/d/1Vqv6-EAdSHMSZvZtE426aXkDiwP8Mdrpft3tiGQ1RH0/edit#gid=0 temp/example-ebola-dataset-1_HXLated+tab.csv
fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ head temp/example-ebola-dataset-1_HXLated+tab.csv
#status #country #adm1 #adm1+code #loc #loc #org #loc+type #affected+dead #affected+confirmed #affected+suspected
Pending Liberia Margibi LR09 Kakata 1 Kakata 2 AFL AFL ETC 0 0 0
Functional Guinea Nzerekore GN008 Nzerekore Ailema (?) ETC 45 56 3
Pending Liberia River Gee LR13 Fishtown Fishtown ETC American Red Cross ETC 0 0 0
Functional Sierra Leone Western SL04 Jui Sierra Leone-China Friendship Hospital (Jui Hospital) Chinese CDC ETC 47 65 17
Pending Guinea Nzerekore GN008 Croix-Rouge française ETC 0 0 0
Pending Sierra Leone Western SL04 Freetown Goderich EMERGENCY ETC 0 0 0
Functional Sierra Leone Western SL04 Lakka Lakka Hospital ETU EMERGENCY Italian NGO ETC 3 17 11
Functional Liberia Margibi LR09 Firestone Firestone Medical Center Firestone Company ETC 14 29 19
Functional Liberia Montserrado LR11 Monrovia Monrovia, Congo Town - Old Ministry of Defence ETU 1 FMT ETC 1 30 6
hxl2tab https://docs.google.com/spreadsheets/d/1Vqv6-EAdSHMSZvZtE426aXkDiwP8Mdrpft3tiGQ1RH0/edit#gid=0 temp/example-ebola-dataset-1_HXLated+tab_hxltabv15.tab
fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ head temp/example-ebola-dataset-1_HXLated+tab_hxltabv15.tab
cD#status+vt_categorical+vt_class D#country+vt_categorical D#adm1+vt_categorical D#adm1+code+vt_categorical D#loc+vt_categorical D#loc+vt_categorical D#org+vt_categorical #loc+type+vt_meta C#affected+dead+number C#affected+confirmed+number C#affected+suspected+number
Pending Liberia Margibi LR09 Kakata 1 Kakata 2 AFL AFL ETC 0 0 0
Functional Guinea Nzerekore GN008 Nzerekore Ailema (?) ETC 45 56 3
Pending Liberia River Gee LR13 Fishtown Fishtown ETC American Red Cross ETC 0 0 0
Functional Sierra Leone Western SL04 Jui Sierra Leone-China Friendship Hospital (Jui Hospital) Chinese CDC ETC 47 65 17
Pending Guinea Nzerekore GN008 Croix-Rouge française ETC 0 0 0
Pending Sierra Leone Western SL04 Freetown Goderich EMERGENCY ETC 0 0 0
Functional Sierra Leone Western SL04 Lakka Lakka Hospital ETU EMERGENCY Italian NGO ETC 3 17 11
Functional Liberia Margibi LR09 Firestone Firestone Medical Center Firestone Company ETC 14 29 19
Functional Liberia Montserrado LR11 Monrovia Monrovia, Congo Town - Old Ministry of Defence ETU 1 FMT ETC 1 30 6
Humm, from this semi-random Reddit thread I found this https://github.com/hugapi/hug. So, in theory, is possible to do an hackish way to expose cli interface as webapp. At bare minimum this can help with pass to orange an URL (even if local) instead of manually save the file with the cli app.
The post cites other alternatives, but this one requires less dependencies and low number of changes. Also for some quick tests, if need to quick expose the URL without setup remote server, would be possible to use ngrok (https://ngrok.com/), so it may be useful if someone elses need something for a quick period and any randon people from community just send an private URL from their computer and solve the issue util something better comes.
A proof of concept exist since at least v0.8.7.1, and is documented on the main README.md.
This can be used standalone, but still require original dataset already be valid HXL and have some tags like +vt_orange_flag_class
to work as hint for the export to Orange.
Trivia: the hxlquickmeta
is one way to automate how a dataset could be tagged to be used with hxl2tab
(which could be useful for very large datasets with so many columns. But the inner parts of bin/hxl2tab still need edit python code (not like most other new tools here with fully configurable ontologies with YAML.
From the README:
hxl2tab
: tab format, focused for compatibility with Orange Data MiningWhat it does: hxl2tab
uses an already HXLated dataset and then, based on
#hashtag+attributes
, generates an Orange Data Mining .tab format with extra
hints.
The
hxl2tab
v2.0 has some usable functionality to use a web interface instead of cli to generate the file. Uses hug 🐨 🤗.If you want quick expose outside localhost, try ngrok.
Installation
This package can both be installed by doing a copy of bin/hxl2tab to a place on your executable path and installing dependencies manually.
The automated way to your path or as part of the Python pypi package hdp-toolchain already with extra dependencies is:
python3 -m pip install hdp-toolchain[hxl2tab]
# python3 -m pip install hdp-toolchain[full]
bin/hxl2tab
https://github.com/EticaAI/HXL-Data-Science-file-formats/blob/main/bin/hxl2tab
TODO: add more information.