Closed paulk-asert closed 4 years ago
Thank you. Merged.
FYI: as I've seen you work on Groovy. I had at some point a package that added some syntax sugar (inline operations or operators or something like that) for groovy. But I can't find it right now.
Cool. I would be interested to see that if you can track it down. FYI, I have an ELKI/Groovy example (using above data as it happens) here: https://github.com/paulk-asert/groovy-data-science/blob/master/subprojects/Whiskey/src/main/groovy/KMeans_Elki.groovy
The code would likely become simpler if you use ELKIBuilder more; as this will use default parameters in many cases.
Good suggestion, I updated.
You may be able to even do (untested)
def cols = ['Body', 'Sweetness', 'Smoky', 'Medicinal', 'Tobacco', 'Honey',
'Spicy', 'Winey', 'Nutty', 'Malty', 'Fruity', 'Floral']
def file = getClass().classLoader.getResource('whiskey.csv').file
def db = new ELKIBuilder(StaticArrayDatabase)
.with('parser.labelIndices', '0,1')
.with('dbc.in', file)
.build()
db.initialize()
Depending on what level you want to use the API.
Getting rid of the easy to forget call to "initialize" is on my todo list. It must not be initialized in the constructor; and for benchmarking it is best to separate initialization from algorithm run time, but its easy to auto-initialize when the user did not not call this explicitly.
That indeed works fine. Added, thanks!
When using NumberVectorLabelParser and supplying labelIndices, getTypeInformation is stopping after the desired number of column names has been reached even though some columns have been skipped. I used this data file: https://www.niss.org/sites/default/files/ScotchWhisky01.txt And designated RowID and Distillery as label indices. Before the change in PR #78 I see this (note the column names): After the change I see this: