Open mlindauer opened 7 years ago
IIRC we put them all into different groups for feature selection -- if everything is in one group, adding one feature is the same as adding all of them.
How do we proceed with the other issue? Is it easy for you in R to fix the column order problem? Or who is responsible?
Done. Is everything ok now?
Is there anything I should do? On Thu, 27 Oct 2016 at 19:16, Lars Kotthoff notifications@github.com wrote:
Done. Is everything ok now?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/coseal/aslib_data/issues/10#issuecomment-256708880, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpQV0dSfY41L8CwLI5mqQ2IzWi3zP-Vks5q4NsVgaJpZM4Kf8DM .
At least I can read the data now (after replacing each ' with "). To generate plots is not really working because the algorithm names are so long. Could we shorten them? e.g., cut the common prefix "weka.classifier" and move the hyperparameter configuration into a readme?
The hyperparameter configuration is part of the algorithm, otherwise we'd get misleading results when doing selection. Could we instead do something to adjust the plots?
Some algorithm names do not even fit in one line on github, e.g. 1160_weka.classifiers.meta.AttributeSelectedClassifier -- -E \weka.attributeSelection.PrincipalComponents -R 0.95 -A 5\ -S \weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
How would you compress it such that it can fit into a plot? All my tries so far lead to plots where the algorithm names were much longer than the actual plot size. I agree that we should still be able to distinguish between different hyperparameter configurations but the details could go into the readme. For example, the above name could shortened to PrincipalComponents_Conf1
What do you think?
Hmm, we don't have anything for this specified in the data format specification. I'm hesitant to come up with an ad-hoc version just to fix the plots. If this doesn't break anything else, I'd prefer doing this only in the plots, i.e. having labels "algorithm1", "algorithm2" etc and a legend somewhere that tells you what algorithm1 is.
For plots I would simply cut off the name after the nth characters and add an ellipsis (...). The main algorithm is always in the beginning of the string. On Fri, 28 Oct 2016 at 20:04, Lars Kotthoff notifications@github.com wrote:
Hmm, we don't have anything for this specified in the data format specification. I'm hesitant to come up with an ad-hoc version just to fix the plots. If this doesn't break anything else, I'd prefer doing this only in the plots, i.e. having labels "algorithm1", "algorithm2" etc and a legend somewhere that tells you what algorithm1 is.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/coseal/aslib_data/issues/10#issuecomment-256988320, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpQVxydyWpvHOwl-9cVvRGhrr67X6ooks5q4jk6gaJpZM4Kf8DM .
For plots I would simply cut off the name after the nth characters and add an ellipsis (...).
Ok, I will do this. However, I'm still not so happy with the solution. If I would talk about these algorithms in a paper, I would also need to come up with some meaningful abbreviation (instead of "algorithm1" or "algori(...)"). Therefore I would prefer to consistently solve the problem in this scenario right away.
Maybe we should extend the ASlib format such that parameter configuration can be specified somewhere. For example, in the ASP-POTASSCO scenario it is only 1 solver with 11 different configurations.
That sounds like a good idea. Let's talk more about this in a bigger group.
On 28 Oct 2016, at 11:04, Lars Kotthoff notifications@github.com wrote:
Hmm, we don't have anything for this specified in the data format specification. I'm hesitant to come up with an ad-hoc version just to fix the plots. If this doesn't break anything else, I'd prefer doing this only in the plots, i.e. having labels "algorithm1", "algorithm2" etc and a legend somewhere that tells you what algorithm1 is.
That sounds like a much better solution than using ellipses, which can get confusing and potentially misleading.
Cheers,
Holger
That would be good. In the OpenML case there can be any number of configurations, but still I can give you a structured representation.
You still need a way to compress that information in your plots, though. On Sat, 29 Oct 2016 at 22:42, hhoos notifications@github.com wrote:
On 28 Oct 2016, at 11:04, Lars Kotthoff notifications@github.com wrote:
Hmm, we don't have anything for this specified in the data format specification. I'm hesitant to come up with an ad-hoc version just to fix the plots. If this doesn't break anything else, I'd prefer doing this only in the plots, i.e. having labels "algorithm1", "algorithm2" etc and a legend somewhere that tells you what algorithm1 is.
That sounds like a much better solution than using ellipses, which can get confusing and potentially misleading.
Cheers,
Holger
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/coseal/aslib_data/issues/10#issuecomment-257114897, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpQV5mLp6lh-6elFQ24hB70xUYALcQ7ks5q46-ygaJpZM4Kf8DM .
On 29 Oct 2016, at 22:07, Joaquin Vanschoren notifications@github.com wrote:
That would be good. In the OpenML case there can be any number of configurations, but still I can give you a structured representation.
You still need a way to compress that information in your plots, though.
I understand Lars’s suggestion as saying that, in cases where algorithm (or configuration) names get too long, we call them “algorithm 1”, …, and that we use these labels in plots, along with a specification of what “algorithm 1” etc. really means. IMO, this could be done in the caption.
Would that address your concern, or am I missing something?
Cheers,
Holger
Hi everyone,
If nobody has any further objections, we could move the OPENML scenario in the master branch.
Cheers, Marius
Ok by me. Regarding your earlier comments, I could make a new version with fewer missing values, but didn't have time yet.
On Wed, Nov 23, 2016 at 10:00 AM Marius Lindauer notifications@github.com wrote:
Hi everyone,
If nobody has any further objections, we could move the OPENML scenario in the master branch.
Cheers, Marius
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/coseal/aslib_data/issues/10#issuecomment-262461739, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpQV25FaDFijSvt97Syo0vMYrerO1nTks5rBADFgaJpZM4Kf8DM .
OK, so we will wait for this new version with fewer missing values? Once the scenario is in the master branch, I would like to avoid to update it too often.
I vote for moving it to master now. A new scenario with fewer missing values could be OpenML-2017 or something like that and would probably involve some new algorithms as well.
Regarding your earlier comments, I could make a new version with fewer missing values, but didn't have time yet.
@joaquinvanschoren could you please give us a rough estimate, when you will have time to add further features? The underlying question is whether we should wait for it for a new ASlib release, or whether we will release the new version first and then add your scenario later.
Do you have a GitHub repo to upload the new files?
Yes.
Hi,
I looked into the "old" OPENML scenario. I already fixed some issues. Right now my tool complains that the order of the columns in algorithm_runs.arff is wrong. Right now it is:
but it should be
Furthermore, I wonder why every feature is in its own group. Since there are no feature costs and the runstatus is always
ok
, we could put all features into one group. In the end, this is not a real issue.Best, Marius