Experimental design for this paper

In former papers we focused on the world parameters, basically. We found that

A single profile is enough.
256 virtual days
High population density.

We didn't change

GA parameters, except population. Not a big difference in many scenarios
Mutation rate, elite rate, all the other EA rates.
Initial population size.
Fitness

I think for this paper we should concentrate, besides systematizing the source of conflicts, in improving the evolutionary algorithm and studying its effects. What do you think?

A bit out of scope: should we use one of the automatic adaptation schemes available for EAs? Alberto Tonda told me about a method they use in all their experiments to set the crossover and mutation probabilities (I don't remember its name, I'll ask him). He says that they never have problems with reviewers saying "why those parameters???"

Comment previous results obtained with MADE (just before the paragraph where the objectives of this paper are explained).

In my opinion, this paper should be focused in one option: testing EA parameters or testing Fitness functions. Changing/Testing everything at the same time will lead to a work quite difficult to be interpreted. I mean, it would be very difficult to conclude if the results have been improved due to a new fitness approach, rather than to a new mutation operator or rate.

In the first case, I think that two good parameters to change in order to increase this promotion could be the crossover and mutation operators, that might include some conditions which lead the population to include 'nemesis' agents to those already inside it.

My point is to apply any simple test as @fergunet proposed to set the EA parameters and then study different fitness approaches. Maybe these would be based in promoting the conflicts.

Another point to take into account is the feedback provided by the reviewers of Evo*:

REVIEWER 1 said:

I question the relevance of the contributions made herein. What did we really learn that can be applied in another context? What's the key contribution?

=> We should clearly describe the paper selling point from the very beggining and reinforce it along the whole text. If it proposes a methodology the adaptation of the results to another kind of problems or methods should be analysed.

REVIEWER 2 said:

In the experimental setup, you explain that more days (longer runs, I presume) would be harder to evaluate. But it is an interesting question if that would be worth it, did you check that?

=> Let's consider at least a light analysis of this

the parameter study is interesting and provides some nice insights, but it could have been explained much better

REVIEWER 3 said:

the proposed scenario is simple with one type of entities and adding more entities might cause a different effect for some of the studied parameters.

=> It this true? In that case, this must be justified in the experimental setup and commented in the results also in the present work.

The results included in the paper show only the effect of each parameter independently without taking into consideration the interaction between the parameters. Extending the study to evaluate the correlation between the parameters could also be useful to understand the relation between them and their effect on the fitness value and would be a good added value to the paper.

=> I agree that maybe a study about the influence of the parameters would increase the paper quality. At least an informal one would be welcomed.

Thanks Antonio. You have good points here.

This is the question I tried to open in a previous issue, but JJ say that Fitness study was the staff for this paper, so the issue was closed.

I would bet by EA parameters because I think fitness function would be the goal of a bigger study when the EA parameters were fixed.

Cheers, Maribel

El 26/01/15 a las 01:44, Antonio Mora escribió:

In my opinion, this paper should be focused in one option: testing EA parameters or testing Fitness functions. Changing/Testing everything at the same time will lead to a work quite difficult to be interpreted. I mean, it would be very difficult to conclude if the results have been improved due to a new fitness approach, rather than to a new mutation operator or rate.

In the first case, I think that two good parameters to change in order to increase this promotion could be the crossover and mutation operators, that might include some conditions which lead the population to include 'nemesis' agents to those already inside it.

My point is to apply any simple test as @fergunet https://github.com/fergunet proposed to set the EA parameters and then study different fitness approaches. Maybe these would be based in promoting the conflicts.

Another point to take into account is the feedback provided by the reviewers of Evo*:

REVIEWER 1 said:

*
I question the relevance of the contributions made herein. What
did we really learn that can be applied in another context? What's
the key contribution?
*
We should clearly describe the paper selling point from the very
beggining and reinforce it along the whole text. If it proposes a
methodology the adaptation of the results to another kind of
problems or methods should be analysed.
REVIEWER 2 said:

*
In the experimental setup, you explain that more days (longer
runs, I presume) would be harder to evaluate. But it is an
interesting question if that would be worth it, did you check that?
*
Let's consider at least a light analysis of this
*
the parameter study is interesting and provides some nice
insights, but it could have been explained much better
REVIEWER 3 said:

*
the proposed scenario is simple with one type of entities and
adding more entities might cause a different effect for some of
the studied parameters.
*
It this true? In that case, this must be justified in the
experimental setup and commented in the results also in the
present work.
*
The results included in the paper show only the effect of each
parameter independently without taking into consideration the
interaction between the parameters.
Extending the study to evaluate the correlation between the
parameters could also be useful to understand the relation between
them and their effect on the fitness value and would be a good
added value to the paper.
*
I agree that maybe a study about the influence of the parameters
would increase the paper quality. At least an informal one would
be welcomed.
— Reply to this email directly or view it on GitHub https://github.com/geneura-papers/2015-GECCO-MADE/issues/6#issuecomment-71402773.

But still reviewers would complain about the fitness design, so at least we will have to explain it better.

@mgarenas If we change the fitness in future, then these obtained EA parameters won't make relation with the new fitness. EAs 101: the EA parameters depend on the problem to solve, if we change the problem, we have to change the EA parameters ;)

So, I am still thinking that "explain the same fitness + more archetypes (it is just a new problem instance) + different EA parameters" is a poor topic for GECCO. It is basically "The Same Thing We Did 2: Electric Boogaloo", so I really would like a big difference with our previous papers, focusing in the conflicts thing, different fitness, with a common metric to "measure" the fitness functions quality (not to be used as fitness). For instance, the example I gave in issue #7:

Metric: Interest of a story (maybe one explained in issue #8)
Fitness 1: The most archetypes the better (a number betwen 0 and 10000)
Fitness 2: The most different archetypes the better (a number between 0 and numberOfArchetypes).

Which fitness generates more interesting (the metric) stories? One story with 100 heroes (F1), or one story with one hero and one villain (F2)? (Different fitnesses, same metric)

EDIT: Other idea: testing different problem-specific operators (that is, test new genetic operators, not values). As there are values in the genome related with some aspect (the food, the fury, the breeding...) I would consider the combination of two types of operators: Basic operators (the classic ones to mutate and interchange values of the vector) and AspectSpecific operators (interchange whole sets of values of the vector (food, fury, breeding...)). This would be a really novel experimentation about parameter values, as it may produce more interesting results.

Centrándonos. En primer lugar perdón por escribir en español, pero estoy aquí con @raiben en el McDonald's y estamos ya mu cansaos después de estar debatiendo toda la tarde (y encima me han puesto una multa viniendo) xD

Hemos decidido enfocar el paper en lo que sigue (ojo, son puntos, que no coinciden necesariamente con las secciones del paper, pero este es el guión del paper):

Hemos dicho en el abstract que los conflictos son importantes. Por lo tanto, vamos a proponer una metodología para definir formalmente los conflictos y bajo qué circunstancias un mundo virtual puede ofrecértelos (eso no lo habíamos hecho en ningún paper). Proponemos un lenguaje basado en predicados lógicos. @raiben se encargará de definir esta sección con las cosas que tiene pensadas
Una vez hecho esto se van a explicar 3 tipos de funciones fitness diferentes:
Simplemente sumar los arquetipos del mundo: Ejemplo: hay 2 arquetipos distintos.
Una que se basa en la distribución o en la aparición de diferentes arquetipos (similar a la que se había usado antes): "Hay 1 héroe y 1 villano" es mejor que "Hay 2 héroes"
Una que premia la distribución de los arquetipos en el mayor número de agentes: Es mejor "Pepe el Héroe y Paco el Villano" que "Pepe el Héroe y Villano, y Paco el nada"
De esos, ¿cuál es el mejor fitness a usar? Para resolver esa pregunta definiremos una métrica que sirva para medir la calidad de la función fitness, midiendo el mundo al final. La métrica propuesta es, precisamente, el número de conflictos/agentes.
Experimental setup:
- Arquetipos: usando la metodología descrita en el punto 1. Los 12 tropos descritos en tvtropes.com (una referencia) sobre la obra de teatro Romeo y Julieta, específicamente aquellos que se pueden modelar en base a conflictos.
- Algoritmo genético: vamos a usar un blx-alpha y un sigma-mutation en lugar del TPX y random-mutation que había en el anterior paper porque estamos usando codificación real, que es más adecuado.
- Parámetros del mundo: los que encontramos en el paper del Evostar, punto.
- Parámetros del EA: los mismos del Evostar: hacer un estudio de distintos parámetros no iba a dar tiempo, y a la vez, creemos que es poco interesante para el GECCO.
- Como vamos a usar 12 arquetipos, por quedarnos tranquilos vamos a usar distinto número de perfiles de nuevo: 1, 3 y 6.
- 30 ejecuciones por configuración (3 fitness) por 3 números de perfiles: total 270 ejecuciones. En cada generación se imprime el fitness total, número de apariciones de cada arquetipo (de los 12), el número de conflictos y el número de agentes. A partir de ahí tenemos datos chulos para evaluar en el siguiente punto.
Análisis de los resultados usando la métrica, diferencias significativas, ver cómo evoluciona el número de conflictos en cada función fitness (ploteando en la misma gráfica), evaluar las apariciones de cada arquetipo y cuantos conflictos genera cada uno, etc. y podemos extraer conclusiones interesantes sobre qué arquetipos producen más conflictos, cuales dependen de otros, etc.

PUNTOS FUERTES DEL ARTÍCULO:

Objetivo del artículo: estudiar la naturaleza de los arquetipos basados en conflictos y cómo promover conflictos.
Para ello, diseñamos una metodología para definir conflictos, que es lo interesante de las historias, junto con los arquetipos
También hemos propuesto diferentes funciones fitness basadas en arquetipos
Hemos definido una métrica a partir de los conflictos, para medir la calidad de esas funciones fitness
Y lo aplicamos en un contexto más complejo que el anterior paper: usamos 12 arquetipos, no 3

Pues listo, si os parece bien nos ponemos a escribir/programar y luego a ejecutar :)

Fdo. @fergunet y @raiben

Jolín escrito así de clarito, parece hasta fácil :-)

Voy comentando más abajo. El 27/01/15 a las 20:57, Pablo García Sánchez escribió:

Centrándonos. En primer lugar perdón por escribir en español, pero estoy aquí con @raiben https://github.com/raiben en el McDonald's y estamos ya mu cansaos después de estar debatiendo toda la tarde (y encima me han puesto una multa viniendo) xD

Vaya!

Hemos decidido enfocar el paper en lo que sigue (ojo, son puntos, que no coinciden necesariamente con las secciones del paper, pero este es el guión del paper):

1.
Hemos dicho en el abstract que los conflictos son importantes. Por
lo tanto, vamos a proponer una metodología para definir
formalmente los conflictos y bajo qué circunstancias un mundo
virtual puede ofrecértelos (eso no lo habíamos hecho en ningún
paper). Proponemos un lenguaje basado en predicados lógicos.
@raiben <https://github.com/raiben> se encargará de definir esta
sección con las cosas que tiene pensadas
2.
Una vez hecho esto se van a explicar 3 tipos de funciones fitness
diferentes:
Simplemente sumar los arquetipos del mundo: Ejemplo: hay 2 arquetipos distintos.

Lo único malo que veo a esto es que nos pase como pasaba en el paper anterior, que una vez terminados los experimentos haya muchas ejecuciones donde el número de arquetipos es 0, lo que nos daría muchos casos poco discernibles. Aunque seguro que @raiben sabe bien como promover los arquetipos para que no pase.

Una que se basa en la distribución o en la aparición de diferentes arquetipos (similar a la que se había usado antes): "Hay 1 héroe y 1 villano" es mejor que "Hay 2 héroes"

Aquí tengo una pregunta. ¿Sería igual de bueno que apareciera un héroe que un villano? o para un mundo virtual es mejor uno de los dos? Evidentemente el caso más sencillo sería considerarlos iguales, pero no sé sólo planteo la duda.

1.
Una que premia la distribución de los arquetipos en el mayor
número de agentes: Es mejor "Pepe el Héroe y Paco el Villano" que
"Pepe el Héroe y Villano, y Paco el nada"
2.
De esos, ¿cuál es el mejor fitness a usar? Para resolver esa
pregunta definiremos una *métrica* que sirva para medir la calidad
de la función fitness, midiendo el mundo al final. La métrica
propuesta es, precisamente, el número de conflictos/agentes.
3.
Experimental setup:

  * Arquetipos: usando la metodología descrita en el punto 1. Los
    12 tropos descritos en tvtropes.com (una referencia) sobre la
    obra de teatro Romeo y Julieta, específicamente aquellos que
    se pueden modelar en base a conflictos.
  * Algoritmo genético: vamos a usar un blx-alpha y un
    sigma-mutation en lugar del TPX y random-mutation que había en
    el anterior paper porque estamos usando codificación real, que
    es más adecuado.
  * Parámetros del mundo: los que encontramos en el paper del
    Evostar, punto.
  * Parámetros del EA: los mismos del Evostar: hacer un estudio de
    distintos parámetros no iba a dar tiempo, y a la vez, creemos
    que es poco interesante para el GECCO.
  * Como vamos a usar 12 arquetipos, por quedarnos tranquilos
    vamos a usar distinto número de perfiles de nuevo: 1, 3 y 6.
  * 30 ejecuciones por configuración (3 fitness) por 3 números de
    perfiles: total 270 ejecuciones. En cada generación se imprime
    el fitness total, número de apariciones de cada arquetipo (de
    los 12), el número de conflictos y el número de agentes. A
    partir de ahí tenemos datos chulos para evaluar en el
    siguiente punto.
ok

1. 2.
Análisis de los resultados usando la métrica, diferencias
significativas, ver cómo evoluciona el número de conflictos en
cada función fitness (ploteando en la misma gráfica), evaluar las
apariciones de cada arquetipo y cuantos conflictos genera cada
uno, etc. y podemos extraer conclusiones interesantes sobre qué
arquetipos producen más conflictos, cuales dependen de otros, etc.
ok

PUNTOS FUERTES DEL ARTÍCULO:

Objetivo del artículo: estudiar la naturaleza de los arquetipos basados en conflictos y cómo promover conflictos.

Para ello, diseñamos una metodología para definir conflictos, que es lo interesante de las historias, junto con los arquetipos

También hemos propuesto diferentes funciones fitness basadas en arquetipos

Hemos definido una métrica a partir de los conflictos, para medir la calidad de esas funciones fitness

Y lo aplicamos en un contexto más complejo que el anterior paper: usamos 12 arquetipos, no 3

Pues listo, si os parece bien nos ponemos a escribir/programar y luego a ejecutar :)

Fdo. @fergunet https://github.com/fergunet y @raiben https://github.com/raiben

— Reply to this email directly or view it on GitHub https://github.com/geneura-papers/2015-GECCO-MADE/issues/6#issuecomment-71716435.

Siento lo de la multa. Va a quedar aquí para los anales, eso sí. :D

Me parece bien lo del español y casi todo lo que proponeís, pero no veo en las funciones de fitness que se tengan en cuenta los conflictos. Hablais de arquetipos distintos, pero no 'rivales' (o némesis), que creo que serían los interesantes, ¿no?

Yo pensaría ya de paso meter algún factor que fomente esto en los operadores, como os comenté, para darle algo más de 'chicha' a lo de los conflictos. Con el fitness únicamente se fomentará menos dicha aparición.

Lo que pasa es que eso complica la experimentación y demás, supongo, aunque se podría pensar en un operador de cruce y otro de mutación simplemente y hacerlo todo con ellos. ;)

Pero como decidáis en este aspecto.

Taluego.

geneura-papers / 2015-MADE-MONOMYTH

Experimental design for this paper #6