awarebayes / RecNN

Reinforced Recommendation toolkit built around pytorch 1.7
Apache License 2.0
574 stars 113 forks source link

Questions about DDPG #15

Closed mdev11 closed 3 years ago

mdev11 commented 4 years ago

Hi! I'm new to RL and currently doing a project in music recommender system using DDPG. Its kinda similar with your DDPG project, and I got some things that I still confused.. If you don't mind, please answer my question..

  1. How many user history data did you use for your movie recommendation?
  2. Did epoch matter in RL especially DDPG? (Sorry if it sounds stupid, but I saw some tutorial and got really confused, some tutorial use randomize data for the environment initial state, which therefore I assume it didn't really care for epoch. But almost all RL environment like in openai gym use one initial state and train it for thousand episode, for example in continuous mountain car, the episode always start in the same position)
  3. If I have an environment which have 15 steps for 1 episode, is it fine if i used discount factor 0.97

Sorry if it didn't really related to your github:( But I hope that you can help me because I got no expert to consult to.. Thank you so much.

awarebayes commented 4 years ago
  1. I stack it by 10 items fixed. You can use LSTM or whatever
  2. What is an epoch? As I said, I stack 10 items rated by user as state, regress one item and its rating as the target action and the reward
  3. I dont know. In the article I use various metrics like pairwise distances and VAE reconstruction error. Read under "Interpreting the results" section see what to look at. Update correspondingly
awarebayes commented 4 years ago

By the way, I always like to look at what people make using my library. Make sure to share the progress when done!

mdev11 commented 4 years ago

Hey thank you so much for answering.

Epoch that I meant is the usual Epoch in Deep Learning. Like when you have 1k dataset, then you train the model using the dataset for 10 times, therefore you're using 10epochs.

Sorry if it confused you, let me explain my question then.. So, my dataset is consists of music sessions, where 1 session is consists of 20 songs that played sequentially by a user. I'm confused like how many sessions I should use for training? Let say I used 1k sessions, therefore there's 1000*20=20000 rows of data. Can I used the 20k data for maybe 200k timesteps, so the model will learn each data 10 times. Or I should just use 20k rows for only 20k timesteps? Which is more efficient for DDPG?

(Sorry if this still confusing you..) Thank you!

awarebayes commented 4 years ago

Do you realize that you can slide it like that: Item IDs (state) | Action 1 2 3 4 5 6 7 8 9 10 | 11 2 3 4 5 6 7 8 9 10 11 | 12

I used ML 20M Dataset with 1-3 epochs. As the name suggests, there are ~20 million ratings

I dont know? I'd say 90/10 train/test split is good. Also cross validation never hurts Why do you say "timesteps"? The entire idea is sequential recommendation. Once user session ends, you need to reset lstm's hidden state. If you reset LSTM's state and then learn anew, it (sequentiality) will be possible I used to sort users by items count so they are somewhat even in # of items. Then for each batch I trained the network and reset the state

mdev11 commented 4 years ago

Yes I actually do slide it like that but just did not explain for simplicity of the question.. I slide every 5, so: State | Action 1 2 3 4 5 | 6 2 3 4 5 6 | 7 ... 15 16 17 18 19 | 20 threrefore 1 episode can do 15 steps, and then move to another session.

I've tried using like only 42k dataset, but it just so time consuming (12-13 hours) for only 1 epoch, thought that i might something be wrong. How long did it take you to train 20M data with 1-3 epoch?

Btw, what I meant by timesteps is the number of the model learning steps.

And thankyou for answering!

awarebayes commented 4 years ago

Optimization is the key When I first started working on dataloading, it took almost 30 hours Now it takes 5 minutes to iterate though the dataset (i5, 1 core), 10 minutes with learning You can use my loader if you choose, look working with your own data in the docs

Important dataloading functions are here: link