Kyzarok / MScProject_AURORA_with_RNN

Extending Autonomous Skill Discovery with Recurrent Neural Networks
2 stars 0 forks source link

AURORA incremental #2

Open bossdm opened 3 years ago

bossdm commented 3 years ago

Hi,

When running the "incremental" version of AURORA, I was having some issue with the file at line 633 of control.py:

comparison_gt = np.load("GROUND_TRUTH.npy")

It cannot be opened as it is not in the repository.

Commenting this line out does not seem to have any effect though. Can it be removed?

I have two other questions: 1) the line 646 looks a bit strange: genotype = [random.uniform(FIT_MIN, FIT_MAX), random.uniform(FIT_MIN, FIT_MAX)]

it sounds from the names "FIT" that this relates to fitness. What I think it does is that it generates random genotypes to cover the full genotypic space and from these individuals later you can get the data to train the auto-encoder.

2) the data that are being trained on, are they trajectories or images? the code seems to suggest it is the trajectory, but then on the other hand there is mention of "self.scaled_traj_image" and the use of convolutional networks would be especially useful for images.

Kyzarok commented 3 years ago

GROUND_TRUTH.npy is the approximated ideal spread in the latent space. I used it as the base metric to calculate KLC, i.e. comparing how much of the actual variety of different behaviours the AURORA versions were able to discover against the manually coded method. You can generate this numpy array by passing GT into the version argument of the command line.

On line 646, your understanding is correct. Statistically speaking the hope is that with a few random individuals you can cover enough of the genotypic space to train the auto-encoder at the start before the QD iterations.

The term image was used almost in the inverse of what it means. The recorded data is a 2D matrix with coordinate for each time step. This data was flattened when passed into the autoencoder, as in the original paper. The use of CNNs may not prove to be useful in this task as the full image size is only 2 x 50 and most advanced CNN architectures require minimum 32 x 32 x 3. I do however believe it is worth exploring using CNNs in more complicated robotic tasks with more sensory information that either matches or exceeds the 32 x 32 x 3 requirement/threshold.

bossdm commented 3 years ago

Ok thanks for the quick reply. It took some time to gather the ground_truth but now it loads correctly.

As I found there were many magical numbers in the network definition, I tried to define all of them with appropriate names. I also saw for maxpooling the custom patch size was not working so also fixed that. Please have a look to see if all of these variables are correct. I tried running it and that did not give a problem so it seems ok at first glance.

Kyzarok commented 3 years ago

Yes they seem fine. I thought I had done all of the hardcoded value housekeeping but clearly I hadn't fully completed it. Thank you.

bossdm commented 3 years ago

Great. Thanks for checking.

bossdm commented 3 years ago

just wondering. If we have

patch_size_convol=(2, 6)

and the shape of the data is (n_batch, 2, 50, 1), then actually the patch_size_pool [1,1,2,1] and [1,2,2,1] are equivalent? Since there would only be 1 patch in the spatial dimension?

If I understand correctly, a (2,50,1)-shaped data point is translated into 1 x 45 convolved features that represent a dot-product of the data values within the patches of shape [2,6]. So the maxpool can actually only do patches of 1x2?

bossdm commented 3 years ago

I think this may be due to the definition of the number of "output channels" which is chosen by the user, and in this case this number is 2 as well. so this expands the volume by 2 according to my understanding. I will define this variable as well.

Kyzarok commented 3 years ago

Sorry had my last exam yesterday and didn't have the chance to respond.

1) Realistically yes. At this level of data complexity the result is 1x2 patch.

2) Is this in reference to the latent space layer?

bossdm commented 3 years ago

to illustrate what I mean, you could look at the commit I just pushed. the output channels of the convolutional layer is chosen independently of the rest, I believe, and this results in a volume for which the 2x2 maxpooling makes sense.