Some questions about this method

weiyuxingchen commented 4 years ago

Hello, first of all, thank you very much for your project sharing, but I would like to describe some questions in the process of reproduction:

When we train under a single trace, we find that it is difficult to converge and can not achieve the effect of the paper. After training, the model always uses a single action, such as * 2, / 2, instead of + 10, - 10; or vice versa;
When we read the code, we found that you will truncate the input before LSTM to [- 1, 1]. For 108MB bandwidth environment, reward will basically exceed this range, so whether violent truncation will cause problems. In your paper, figure 3 shows that reward will exceed - 25000, which we do not know;
We have made a lot of modification attempts: input level modification, network structure modification, various reward modifications, but the large probability can not converge, let alone multiple tracks training together;
Congestion control should not be a particularly complex task. We have made many attempts with your method, but all of them can't converge well (single environment). Is reinforcement learning so bad (we also contact reinforcement learning for the first time), or what are the limitations of your method we don't know;

Looking forward to your reply.

odelalleau commented 4 years ago

Hi @weiyuxingchen,

Regarding your first question, about "the effect of the paper", note that results in Fig. 3 were actually obtained when training over the first 6 jobs in experiments.yml, not on a single trace. That being said, you may notice when re-running such an experiment that the model does not work well on all 6 training jobs, this is a known issue I'm currently looking into.

Regarding your other points, I'll get back to you later as there are several things I need to double check first. This may take a little while due to time constraints on my end.

odelalleau commented 4 years ago

Hi @weiyuxingchen , I just wanted to let you know that I am still looking into it. I actually found some potential issues in the current implementation, and I'm working on a fix. I'll share more once I am confident that things are working as intended.

odelalleau commented 3 years ago

Hi @weiyuxingchen, I apologize that it took so long to get back to you on this (!) It took me a while to identify the problems / fix them / get the code in a good shape for release. FYI the main two issues were related to the bandwidth and delay computations (and since the reward is based on these, this was affecting training). If you're still curious about giving it a try, I suggest that you re-install everything from scratch. I'll close the issue for now, but feel free to open a new one if you run into new problems. I should now be able to address them more swiftly :)

facebookresearch / mvfst-rl

Some questions about this method #27