Very exciting to see you guys' remarkable work on stablevicuna!!
And I read through your blog and notice that all the dataset is open sourced and available; however, considering the training code part, the only mentioned details are that you are using trlx for training. So will there be any more detailed recipe or code for the RL tuning phase?
Many thanks in advance and really appreciate your effort!!
Very exciting to see you guys' remarkable work on stablevicuna!! And I read through your blog and notice that all the dataset is open sourced and available; however, considering the training code part, the only mentioned details are that you are using trlx for training. So will there be any more detailed recipe or code for the RL tuning phase? Many thanks in advance and really appreciate your effort!!