SmartDataAnalytics / SML-Bench

A Benchmark for Machine Learning from Structured Data
Apache License 2.0
21 stars 4 forks source link

preprocessing phase #11

Open giuseta opened 8 years ago

giuseta commented 8 years ago

For some systems it is necessary to change a bit the syntax of the files given as input. These changes could be time expensive and have repercussions on the running time (a timeout could be triggered and the algorithm wasn't actually launched). For instance, In SLIPCOVER [1] it is necessary to slightly modify the syntax of the dataset before you can run it. Therefore I propose to add another phase named 'preprocessing'. This phase should be executed once for each learning system before the first run. A system that needs the preprocessing phase should write in system.ini

[main]
...
preprocessing = yes
preproc_fields = field1, field2

If preprocessing is set to yes, then SML-Bench should run a script called preprocessing that is in the same directory of run. preprocessing accepts a configuration file that contains information about workdir,framework, etc. plus the path of the output configuration file that will be written by preprocessing:

...
[preprocessing]
conf = <path_of_preprocessing_conf>

The file will contain the preproc_fields as follows:

[preprocessing]
field1 = <value>
field2 = <value>

The run script will receive a configuration file like the one that follows: ~~ [data] ... [framework] ... [filename] ... [preprocessing] field1 = field2 = ~~ (Notice that field1 and filed2 must be the same defined in system.ini) Does it have sense for you?

[1] Bellodi, Elena, and Fabrizio Riguzzi. "Structure learning of probabilistic logic programs by searching the clause space." Theory and Practice of Logic Programming 15.02 (2015): 169-212.

┆Issue is synchronized with this Trello card by Unito