(to be tested) HXL widget to pre-process training referential data (already in tabular, not compiled model) to adapt to the field names / remove excessive details to increase reusability with less user steps

fititnt / orange3-hxl

[early draft] HXL visual ETL (Orange Data Mining add-on). See https://github.com/biolab/orange3/discussions/6092

0 stars 0 forks source link

The idea

This needs some testing to check if it is necessary, but the goal would be

make two widgets, both accept the reference dataset and main dataset.

One "output" a variant of training referential data; for example, if the training data has much more information than what the users will put against, it could simplify the training data. This one is likely to be the most important
The second would do the same, but maybe just change the columns on the working data the Orange3 would ask the user to do it. I think this mostly happens if the working data already have the column to replace

Humm... Ok. Orange3 by default will already try to make some considerations and ignore extra information.

Captura de tela de 2022-08-13 18-49-55 .

It still will show message saying that it coerced source data, but it seems that it already is able to ignore extra information. Not tested yet if columns have small name variations. But in any case, since we can both release training data with HXL simpler sintax, HXLTM linguistic, and then with full HXL+RDF, the column names would change a lot.

Tested for the screenshot

`boolean.regula.hxl.tab`

#item+ix_zzbcp47    #item+class+source  #item+class
discrete    discrete    false true
        class
und false   false
und true    true
und 0   false
und 1   true
und f   false
und t   true
und n   false
und y   true
und no  false
und yes true
und FALSE   false
und TRUE    true
en  false   false
en  true    true
en  no  false
en  yes true
es  no  false
es  si  true
es  sí  true
pt  nao false
pt  não false
pt  sim true

`boolean.testdata.hxl.tab`

#item+class+source  #item+class
discrete    false true
    class
false   false
true    true
0   false
1   true
f   false
t   true
n   false
y   true
no  false
yes true
FALSE   false
TRUE    true
no  false
si  true
sí  true
nao false
não false
sim true

fititnt / orange3-hxl