EticaAI / HXL-Data-Science-file-formats

Common file formats used for Data Science and language localization exported from (and to) HXL (The Humanitarian Exchange Language)
https://hdp.etica.ai/
The Unlicense
3 stars 1 forks source link

`hxl2arff`: Attribute-Relation File Format (ARFF), focused for compatibility with WEKA, "The workbench for machine learning" #3

Open fititnt opened 3 years ago

fititnt commented 3 years ago

TODO: add more information

fititnt commented 3 years ago

Different from hxl2tab #2, the ARFF uses the 'class' (classifier) directly as would be an format type, and not as an extra attribute. Also, ARFF does not have an 'ignore' or 'skip' field direct in the file (if have to be done, I think is via interface).

But an very important point: Weka complains a lot if the exported file is not on a very strict format. It means that fields likely to be 'meta' or in special 'ignore' for Orange may not be exported as default for Weka. This may be pertinent if each exported field value is not validated to remove all characters that could make the Weka complaint.

Captura de tela de 2021-01-26 00-23-52