[WIP] Deterministic environment generation

barskern commented 2 years ago

So a wasted night, but I figured I would uploaded here anyways if you guys might perhaps have some suggestions. First, what was the motivation:

After thinking a bit, I think perhaps that the reason it doesn't work, even after "resetting" the random generator seed (so that it should be "identical" each day of interactions), is because we sample randomness BASED ON randomness. E.g. in the _go_to_page function we sample randomness on wether or not to continue to sample randomness. So what this means is that we might call into randomness a non-deterministic amount of times, which means that the underlaying big generator advances at different paces for each of the times of generation. If we could get the randomness to be sampled exactly the same amount of times no matter how many interactions we are to generate, I think it would be deterministic, even for proper randomness. I believe a way to do this is for e.g. _go_to_page is to change it from being recursive to being a loop which always generates the full graph, and then we sample wether to keep the bought items or not after they have been generated. I will take a jab at this :)

Then, my results are that it seems like rng.dirchlight and others are not deterministically better when their inputs change. E.g. I would think that if you seed the random number generator with the same seed, and then the "only" difference between run would be the parameters, it would always be better. But the more I think about that the "sillier" it sounds. Anyways, maybe there is another way we could test stocastic parts of our code. Any suggestions? This would be great to better compare learners, however I suppose it's not super relevant neither, the learner will alter the environment while learning from it.

raul-singh commented 2 years ago

I don't want to belittle your work, because you surely did a good job! But the more I think about think the more I'm not convinced. We are putting so much time in doing these things instead of doing the actual project just by the fact that we are still on step 2/7 after 3 weeks of work or something like that.

As we agreed during our last meeting, the deterministic environment should be just something simple, that does not modify any part of the existing code. If this is not possible, I don't think we should go down this route for something that's not even necessary for the scope of the project.

We spent days and days coding, commenting and reviewing the environment to make it good and rock solid and now rewriting a good portion of it for something we don't really need makes me feel really sad.

Sorry if look rude or something like that, this is not my intention. But I think we are loosing our mind too much on things that are not important.

barskern commented 2 years ago

I don't want to belittle your work, because you surely did a good job! But the more I think about think the more I'm not convinced. We are putting so much time in doing these things instead of doing the actual project just by the fact that we are still on step 2/7 after 3 weeks of work or something like that.

As we agreed during our last meeting, the deterministic environment should be just something simple, that does not modify any part of the existing code. If this is not possible, I don't think we should go down this route for something that's not even necessary for the scope of the project.

We spent days and days coding, commenting and reviewing the environment to make it good and rock solid and now rewriting a good portion of it for something we don't really need makes me feel really sad.

Sorry if look rude or something like that, this is not my intention. But I think we are loosing our mind too much on things that are not important.

Thank you for your honesty! I agree with your sentiment here. We need to head into the actual work and time has surly been spent on perhaps divergents tasks. I suppose last night I had this idea and really wanted to make it happen as it would be awesome to show through testing that e.g. UCB would always be better than stupid.

Though I fully agree with leaving this here and focusing on completing the direct tasks if the project.

I want to emphasize that I do not feel belittled at all from your honest opinion and I think it's valuable for me that you help me stay on the "right" path. Thank you!

OLA2022-Group-12 / OLA2022-Project

[WIP] Deterministic environment generation #11