fititnt / orange3-hxl

[early draft] HXL visual ETL (Orange Data Mining add-on). See https://github.com/biolab/orange3/discussions/6092
0 stars 0 forks source link

(to be tested) HXL widget to pre-process training referential data (already in tabular, not compiled model) to adapt to the field names / remove excessive details to increase reusability with less user steps #2

Open fititnt opened 1 year ago

fititnt commented 1 year ago

While making tests on https://github.com/fititnt/lsf-orange-data-mining (mostly to create manually crafted training data) I just noticed that the way the orange interface works, seems that it need from the user that the column names from the training references must match the column names of non already meta values from the working dataset.

Ok, it works as expected, but for example it would mean we would need to explain to the user how it should rename the columns for either case.

The idea

This needs some testing to check if it is necessary, but the goal would be

fititnt commented 1 year ago

Humm... Ok. Orange3 by default will already try to make some considerations and ignore extra information.

Captura de tela de 2022-08-13 18-49-55.

It still will show message saying that it coerced source data, but it seems that it already is able to ignore extra information. Not tested yet if columns have small name variations. But in any case, since we can both release training data with HXL simpler sintax, HXLTM linguistic, and then with full HXL+RDF, the column names would change a lot.

Tested for the screenshot

boolean.regula.hxl.tab

#item+ix_zzbcp47    #item+class+source  #item+class
discrete    discrete    false true
        class
und false   false
und true    true
und 0   false
und 1   true
und f   false
und t   true
und n   false
und y   true
und no  false
und yes true
und FALSE   false
und TRUE    true
en  false   false
en  true    true
en  no  false
en  yes true
es  no  false
es  si  true
es  sí  true
pt  nao false
pt  não false
pt  sim true

boolean.testdata.hxl.tab

#item+class+source  #item+class
discrete    false true
    class
false   false
true    true
0   false
1   true
f   false
t   true
n   false
y   true
no  false
yes true
FALSE   false
TRUE    true
no  false
si  true
sí  true
nao false
não false
sim true