geneura-papers / 2017-GPRuleRefinement

Repository for the GPRuleRefinement paper to be sent to a Journal.
Artistic License 2.0
0 stars 0 forks source link

Graphs and figures suggestions #24

Closed unintendedbear closed 7 years ago

unintendedbear commented 7 years ago

For now I've included how the different folds converge because in the other paper we've found that use a similar technique, they only divide the data one time (or they don't specify anything else), while we do 10-fold cross-validation, and I thought it was remarkable. This way, we demonstrate that with our framework you can randomly select a partition and you won't lose significant performance with regard to other partitions.

If it's not useful, or there are other graphs I should include, please suggest them here.

JJ commented 7 years ago

You don't prove anything by showing a graph with the evolution of fitness in every fold. The only thing you prove is that fitness converges, but we pretty much knew that, because it's a genetic algorithms. If you use 10-fold cross validation, that's fine. Your results will be much more precisely reported, and you can say so in the conclusions. That's it.

unintendedbear commented 7 years ago

Ok, then I'm open to any kind of graph you suggest. I've seen the ones with boxplots in @fergunet 's papers, and I can compare the different experiments we've done. But for the rest, I'm all ears.

JJ commented 7 years ago

2017-02-21 10:42 GMT+01:00 Paloma de las Cuevas Delgado < notifications@github.com>:

Ok, then I'm open to any kind of graph you suggest. I've seen the ones with boxplots in @fergunet https://github.com/fergunet 's papers, and I can compare the different experiments we've done. But for the rest, I'm all ears.

Well, it's kind of hard to suggest charts without having even an inkling of what the results are or the shape they have. My point is whatever they are, I think that these charts are not adequate for this paper given the objectives it has. In general, charts have to convey the results in a way that allows the reader to compare them or assess their quality.

unintendedbear commented 7 years ago

Ok, results are in https://github.com/geneura-papers/2017-ESWA/commit/487d074eaa976530d312cbb7eb3dca5834936a2e so you all have them

JJ commented 7 years ago

OK, there are a bunch of files and I don't know what they mean or can get the big picture. You have written this in the abstract:

The simulation results over real data and a comparison with the results achieved by other techniques confirm the viability, effectiveness, and applicability of the GP approach to the BYOD security context.

You have to process and represent the results in such a way that you show that the approach is viable, effective and applicable in that context.

fergunet commented 7 years ago

If we are comparing different methods and fitness, I think it would be useful to compare how the different configurations behave, to see if they are viable. For example, showing boxplots of best individuals during the evolution, as I did here: https://github.com/geneura-papers/2015-ASOCO/blob/master/mmdp-size-150-mut-0.006-xover-1-heterohardware-adaptsize.eps (dammit, github does not render eps, download the file to see it :P, go directly to the paper here http://www.sciencedirect.com/science/article/pii/S1568494615006468 )

If not, comparing boxplot of the best individuals (that is, 10 folds->10 best individuals) per configuration, to see at a glance the variability of the results of each configuarion. We have done this in a lot of papers.

What do you think?

unintendedbear commented 7 years ago

I like the idea, I will include it, separating by both the approach and the fitness :D

fergunet commented 7 years ago

Add the figures to the paper and start explaining the differences between them to justify the best configuration decision, and then we can polish them. For example, maybe we could unify the F_Acc (Michigan vs Pittsburgh) and F_Conv (Michigan vs Pittsburgh), and play with the Y-axis scale, or something like that.

unintendedbear commented 7 years ago

The disadvantage of unifying them is this: rplot rplot01

JJ commented 7 years ago

For the first, it's better to use a table. For the second, a table and highlight those with statistically significant values.

unintendedbear commented 7 years ago

There's a table already. So, no graphics at all or what do you suggest?

fergunet commented 7 years ago

Also, issue #33 is about the statistical significance test assigned to me (pending).

JJ commented 7 years ago

I'm not saying anything about graphics the class. Just about those graphics. Graphics must be used to let a person find patterns in plain sight that might be hidden in figures. In this case they clearly don't do that, as you have said yourself.

unintendedbear commented 7 years ago

Another suggestion, by @deantares :

rplot02

JJ commented 7 years ago
  1. Graphics are used to convey information that cannot usually be grasped by simple numbers. Most usually for comparison between two different techniques.
  2. Illustrations must prove a point. They must be related to the objective of the paper or thesis. They are there to say "You see? It's just the way I said".
  3. They must follow some very strict rules, exemplified by Tufte.

If you think 1. and 2. are fulfilled, then find out about 3.