モジュールiqlの学習用アノテーションの仕様は問題があります

KimamanaNeko commented 2 years ago

iqlで学習したいですがアノテーションの仕様はこのようなエラーを吐きます。 ValueError: invalid literal for int() with base 10: '220522-bf4bfa45-1e87-4dd7-aece-dc63dd1c7508'

columnsの入力内容は sparse, numeric, progression, candidates, index = columns[:5] これとAnnotateの出力形式は違うと思います。

Annotateに合わせて修正しました。 uuid, sparse, numeric, progression, candidates, index = columns[:6]

でもreward の内容は設定されていませんので、 elif len(columns) == 6+1: reward = int(columns[5+1]) reward /= 100.0 reward = torch.tensor(reward, device='cpu', dtype=torch.float32) うまく学習できません。

ただの推測なんですがreward の値をあらかじめ設定する必要があります。（Annotateの既存のコードを改変など方法）ご意見いただければ幸いです。

Cryolite commented 2 years ago

Sorry for the confusion. The annotation schema described in README.md is for behavioral cloning including models to be trained by kanachan.training.bert module, and which is quite different than the one for offline reinforcement learning including models to be trained by kanachan.training.iql. The annotation schema for offline reinforcement learning is completely undocumented. The document for annotation will be reorganized according to the current implementation.

KimamanaNeko commented 2 years ago

Thanks for the answer, I have used about 300,000 games to train so far, but the bert model is very weird in my replay analysis program. Although the loss value is going down, it doesn't perform any better in simulation. I am a bit confused whether I am doing something wrong Or is it hard to reach the number of offline reinforcement learning needed in my personal environment

Cryolite / kanachan

モジュールiqlの学習用アノテーションの仕様は問題があります #15