OLA2022-Group-12 / OLA2022-Project

Project for the Online Learning Applications class at Politecnico di Milano a.a. 2021-2022
2 stars 0 forks source link

Fixes to be made #13

Closed raul-singh closed 2 years ago

raul-singh commented 2 years ago

I don't know if this the right way to do it but I had a couple of issues with the code and i thought of opening an issue. I have encountered some problems in our code.

  1. I tried on my pc to remove manually all randomness from the environment and the simulation, just to discover that the clairvoyant_learner always performs worse than the stupid_learner. And also with all the usual randomness often performs worse. @davide-rigamonti-polimi made me notice that with a very low daily budget the clairvoyant learner performs better that the stupid one, however I don't think that this is acceptable, we have to make sure that the clairvoyant performs as intended at least most times. @barskern do you think you can give it a look? If you want we can discuss together for a fix.
  2. @dorianboille , I looked into the simulation and I saw that the number of daily customers is chosen from a random number between 0 and 100 (and these two number are harcoded). I think the best way would be to pass these two value as an argument when creating a simulation, also because I think that 0-100 brings too much randomness into the system. What do you think?
barskern commented 2 years ago

This is a great way to do it! Though to be quite pedantic, the best would be to create to separate issues so that these to different issues could be discussed without interleave in the comments. Also the title is somewhat general. However I want to emphasize that I love the solution of an issue still, it gives a better overview than just messages. Now on to the issue at hand!

  1. I do believe that Riga and me saw the opposite so I would love to see the counter example. When @davide-rigamonti-polimi sent me the graphs the mean of the absolute rewards for the clair was higher than stupid, no? Or perhaps I misunderstood. Anyways, this should clearly be fixed. There was a lot of noise in the rewards due to the high noise in the customer count, hence we also talked about a reward_per_customer metric. Has this been seen to be worse aswell?

  2. Yes, I also believe those should be parameterized. And also I think it would be more "realistic" as a normal distribution with a given mean and standard deviation. Uniform randomness rearly occur in real life (I think).

raul-singh commented 2 years ago

Ok, I tried to run Riga's notebook, with the same values he set. The only thing I changed is the budget, I've increased it to 300 instead if 100, this is the result with the clairvoyant: image

This instead is the result with the stupid (using the same exact data, apart from the budget, in the same exact notebook with the same rng): image

As you can clearly see, there is a small difference, but here the stupid_learner it's performing slightly better, which shouldn't make any sense, because the clairvoyant should always outperform the stupid. Even if with low budget the learner performs as intended, I don't think we can accept something that breaks if we change a value in a totally legitimate range.

daviderigamonti commented 2 years ago

Ok, I tried to run Riga's notebook, with the same values he set.

The notebook present in the remote branch is not updated as of now, I'm still experimenting on which values produce better/worse results on my local copy, still, I agree on the fact that the results are quite strange and it can be either due the ClairvoyantLearner algorithm, the parameters we pass to the environment, the rewards assigned by the simulation or a combination of the previous points.

I think that a good starting point would be finding a set of parameters for the environment that we know for sure that "works" in order to remove a point of uncertainty and debug freely the other types of learners.

raul-singh commented 2 years ago

I don't know if it really matters, in the sense that I said in the issue I used your notebook values just for reproducibility, but I also tried many different values on my own, and many times the clairvoyant performed worse.

I don't think keeping a "good" set of parameters otherwise the learner breaks is something we can accept. And this applies to everything. I think our code should be robust and flexible. Fortunately the clairvoyant learner is something pretty independent from everything else, and not even something strictly required, so the clairvoyant can be tweaked in parallel while we go on on the project.

barskern commented 2 years ago

Okay, so a few points that are important here. The results you are seeing is exactly the same as I am talking about in the branch where I was trying to make a deterministic environment. The very important thing to know is that even though the rng is the same, that doesn't mean the same choices and probabilities are considered! This is because the arguments to dirchlet are altered which seems to somehow also alter the choices it make. It is not deterministic in the sense that for the same seed, different arguments doesn't mean that it will always be better, from my experience. One way to test this is perhaps to run the following test (beware of pseudocode);

NB! This code does not work 😅 I added an example that actually works below.

for i in range(100):
    for j in range(2):
        click_ratios = [np.abs(j - 0.2), np.abs(0.2 - j)]

        rng = np.random.default_rng(i)
        fractions = rng.dirchlet(click_ratios)

        # The biggest "click_ratio" should always produce a bigger fraction if dirchlet is "deterministic" on its arguments

        biggest = fractions[np.argmax(click_ratios)]
        smallest = fractions[np.argmin(click_ratios)]

        if biggest < smallest:
            print("dirchlet is not deterministic!!")

If this prints we can see that even though the click ratios are favoring one side for the same rng, it might happen (due to randomness) that it isn't! Hence I believe this might be why we see the results we see. Or related at least.

Another important part is that now the clairvoyant algorithm doesn't use the graph (because the graph influence branch isn't merged yet). Hence it might be that the clairvoyant favors a product that have bad continuations in the graph.

This is typed on my phone while traveling with my parents. I would love to join the discussion more thoroughly through code on Sunday.

raul-singh commented 2 years ago

Ok, I have few thoughts regarding this. In my journey of analyzing the learners, as said in the beginning of the issues, I tried to gradually remove the randomness to see if the problem was the randomness itself. Turns out it's not. Problem is, with a deterministic environment I see the stupid learner performing better.

You made a correct observation about rng, however I tried to run many times the notebook, also restarting the kernel and the general result is always the same. I tried to swap the Dirichlet for a softmax and still the result are similar. So I'm sure it's not just a fluke, but something the clairvoyant is not considering.

I think, as you mentioned, that the main problem is the graph. The learner does not take the graph into account, which is something that can lead to highly underestimating reward! We shouldn't just consider the reward of the primary product. Just saying, the alpha-less learner by definition on the assignment takes into account the graph, so I think the clairvoyant should too!

barskern commented 2 years ago

And a way to solve this is to take randomness into consideration and say that running a statistically significant amount of experiments, the clairvoyant learner should on average perform better. So even though it's the ideal algorithm, it might perform worse in a single experiment due to randomness. Imagine it just has the most horrible luck. Even though it is highly unlikely that you roll 3 similar numbers in a row with dice, it does happen. I believe it is the same in our case because the randomness isn't deterministic for a single seed, so we cannot make that assumption.

raul-singh commented 2 years ago

I probably worded myself badly. As I tried to say, I ran several experiments and i pretty much always got the same result.

You made a correct observation about rng, however I tried to run many times the notebook, also restarting the kernel and the general result is always the same. I tried to swap the Dirichlet for a softmax and still the result are similar. So I'm sure it's not just a fluke, but something the clairvoyant is not considering.

I tried with different values, different RNGs, no RNG and so on. And, in a wide range of value, I get the clairvoyant performing worse. What I think we should do is take the graph into account. This is the first solution I can think of.

Suppose we have a primary_product P1 of price 15 and a secondary product P2 of price 10. For simplicity we will assume to only have these two products. Now suppose the graph connection between P1 and P2 is 0.5, meaning that the probability to buy P2 after buyng P1 is 50%.

If we compute, for example, that 100 users will land on P1 page and buy it, we estimate a reward of 100*15 = 1500. However, this is incomplete. We should add the price of the second product times the probability to buy it to get a more correct estimation on average, so: estimated_reward = num_users_buying_p1 * (P1.price + graph[P1,P2] * P2.price). By using the values of my example we would have 100 * (15 + 0.5 * 10) = 2000, which is a matematically correct way to take in account probabilities and it is how I have done it in previous projects and exams. This looks to me like a more accurate reward estimation, since on average, taking randomness into account, that should be the actual expect value.

I think this would be the best way to proceed and to consider the average expected reward. Let me know what do you think.

barskern commented 2 years ago

Ok, I have few thoughts regarding this. In my journey of analyzing the learners, as said in the beginning of the issues, I tried to gradually remove the randomness to see if the problem was the randomness itself. Turns out it's not. Problem is, with a deterministic environment I see the stupid learner performing better.

I dont really understand how you removed the randomness. Could you show me through code perhaps?

You made a correct observation about rng, however I tried to run many times the notebook, also restarting the kernel and the general result is always the same. I tried to swap the Dirichlet for a softmax and still the result are similar. So I'm sure it's not just a fluke, but something the clairvoyant is not considering.

I guess what would be the equivalent of removing randomness is perhaps to just use the click ratios directly? Or do they need to be "soft-maxed"? Do the click ratios not sum to one?

I think, as you mentioned, that the main problem is the graph. The learner does not take the graph into account, which is something that can lead to highly underestimating reward! We shouldn't just consider the reward of the primary product.

Just saying, the alpha-less learner by definition on the assignment takes into account the graph, so I think the clairvoyant should too!

Yes I totally agree on this! It will take the graph into account as soon as the graph influence branch is merged.

barskern commented 2 years ago

I probably worded myself badly. As I tried to say, I ran several experiments and i pretty much always got the same result.

You made a correct observation about rng, however I tried to run many times the notebook, also restarting the kernel and the general result is always the same. I tried to swap the Dirichlet for a softmax and still the result are similar. So I'm sure it's not just a fluke, but something the clairvoyant is not considering.

I tried with different values, different RNGs, no RNG and so on. And, in a wide range of value, I get the clairvoyant performing worse.

What I think we should do is take the graph into account. This is the first solution I can think of.

Suppose we have a primary_product P1 of price 15 and a secondary product P2 of price 10. For simplicity we will assume to only have these two products. Now suppose the graph connection between P1 and P2 is 0.5, meaning that the probability to buy P2 after buyng P1 is 50%.

If we compute, for example, that 100 users will land on P1 page and buy it, we estimate a reward of 100*15 = 1500. However, this is incomplete. We should add the price of the second product times the probability to buy it to get a more correct estimation on average, so: estimated_reward = num_users_buying_p1 * (P1.price + graph[P1,P2] * P2.price). By using the values of my example we would have 100 * (15 + 0.5 * 10) = 2000, which is a matematically correct way to take in account probabilities and it is how I have done it in previous projects and exams. This looks to me like a more accurate reward estimation, since on average, taking randomness into account, that should be the actual expect value.

I think this would be the best way to proceed and to consider the average expected reward. Let me know what do you think.

I certainly agree with that reward estimation yes, that sounds better. But what we are comparing is actual reward and not the estimate, or perhaps I'm misunderstanding?

raul-singh commented 2 years ago

I guess what would be the equivalent of removing randomness is perhaps to just use the click ratios directly? Or do they need to be "soft-maxed"? Do the click ratios not sum to one?

That makes sense, beware only that click_ratios are not "normalized" values.

I certainly agree with that reward estimation yes, that sounds better. But what we are comparing is actual reward and not the estimate, or perhaps I'm misunderstanding?

I don't think I fully understand the question. We cannot know the actual reward given the randomness in the environment. We can only try to make the most accurate possible estimation, that; s what I was trying to talk about

barskern commented 2 years ago

After thinking more about this issue, I decided to make a notebook which runs some examples to show my understanding of determinism and randomness regarding the seeded random number generators.

https://github.com/OLA2022-Group-12/OLA2022-Project/blob/master/notebooks/DeterministicSeededRandomness.ipynb

EDIT: I suppose my goal of showing this is that even though we seed the environment with a given seed, it won't be deterministic across it's arguments by extension of dirichlet not being deterministic, so there is no way to compare the stupid and clairvoyant learner on a single environment for the time being.

raul-singh commented 2 years ago

I don't know if you are trying to say that there is nothing to fix. Because I think there is. I don't think giving all the blame to the Dirichlet function is the right way to do things. Dirichlet outputs a ratio, the difference between using as input something such as [1,1,1] and [100,100,100], which have the same ratio, is the fluctuation. [1,1,1] fluctuates way more than [100,100,100]. The bigger the inputs are, the less it fluctuates. That means that with high budget, we get really low fluctuations with the Dirichlet (check it if you have doubts). So with [100,100,100], the output will be similar to [0.33,0.33,0.33], thus behaving almost like a softmax. Indeed, as I said previously, if you take the code of get_day_of_interactions and use a softmax function instead of the Dirichlet, you get the same results I spoke about at the beginning of this issue. (if you ask me how I did it I simply hardcoded it locally) So, there is no reason at all to blame the Dirichlet function. Yes, there is still randomness in the code, but if I run an experiment with 500 days of interactions and the clairvoyant on average performs worse, that's not something we can ignore, let alone saying that we have no evidence and no way of making this comparation. As I previously said, what I think is the problem is the clairvoyant's reward estimation.

Sorry if I seem harsh, that's not totally my intention. I'm just concerned, because I understood that you don't want to fix it. As of right now, the problem looks minor, but I believe that if we don't fix anything, soon every learner will outperform the clairvoyant.

raul-singh commented 2 years ago

Yes I totally agree on this! It will take the graph into account as soon as the graph influence branch is merged.

@barskern I just realized there has been misunderstanding. I was not talking about the social influence graph, i was talking about the graph in the environment, the one with the weights denoting the probability to buy the secondary product