fb_offdiag loss continuously increasing

Hello, have you solved the problem? I also meet problem when runing pretrain.py on walker_walk task without goal space with:

python pretrain.py agent=fb_ddpg task=walker_walk use_tb=1 use_hiplog=1

The results are worse than the table H.1 with: {"walker_stand": [353.8322190094244, 418.6318108405257, 335.8429828868224, 334.51483483278935, 339.43514290515674, 384.7798850811255, 418.7198206195864, 440.61859842467027, 454.6271120661993, 452.45770809301223], "walker_walk": [77.28714943868215, 58.549794169900146, 60.937718250089695, 62.07264886559247, 24.72064703958593, 59.22535404592635, 59.451793289620284, 48.067452957957485, 61.299411027890564, 82.53024862622551], "walker_run": [57.569717015372234, 58.85252614506342, 57.00703493202376, 56.872888793695964, 66.58976053839089, 75.49459284338043, 62.113378796459685, 44.373423609283016, 59.20851842411931, 37.46675445934915], "walker_flip": [63.7572401772806, 62.830305863228126, 67.3407532970549, 75.0724107521857, 64.76148439782193, 84.88779216129096, 77.26596775272307, 76.63966200280218, 76.58058053644595, 69.75981529543918]}

I also tried the settings of rnd datasets with: python -m url_benchmark.train_online agent=fb_ddpg task=walker_walker load_replay_buffer=./walker/rnd/replay.pt

The results are similar bad with: {"walker_stand": [275.7246864587886, 296.6484547525342, 258.06501266360027, 265.47129692911415, 160.72373388565242, 266.7950944664016, 145.92666830500957, 137.30532461022776, 261.7305010323612, 261.73697814663615], "walker_walk": [44.62711305186533, 47.08105737255253, 32.11097965920191, 29.436152193898813, 47.40278391619204, 21.94067618163266, 32.21410585611251, 46.61406128378957, 34.729132120625394, 26.949257506722102], "walker_run": [24.08844599085796, 22.518815043404366, 45.34544151973348, 22.93017074464161, 23.12095191082715, 46.13564350106748, 22.990020371120096, 51.02555149354239, 22.80571242539997, 49.04932942574694], "walker_flip": [47.37142182665903, 25.864089121846757, 23.2523656072743, 102.35748470245672, 31.012177056739265, 34.615043785481525, 22.742667634456296, 27.44322493374879, 23.32321708397704, 24.317725216278394]}

When I try the command of: url_benchmark.pretrain agent=fb_ddpg task=walker_walk goal_space=simplified_quadruped use_tb=1 use_hiplog=1 It performs good with: {"walker_stand": [977.4621687259832, 983.167575496185, 944.2877587504356, 961.9003786843554, 971.8828711186328, 969.198572839937, 968.9466619356098, 971.6341855616837, 956.8127300958071, 971.3770472534463], "walker_walk": [942.9365129907354, 902.5143824974456, 909.5626063153646, 889.155010931203, 915.939453687987, 857.2134377981365, 891.8572132825906, 971.1262291043645, 915.5019116511814, 978.452342690913], "walker_run": [522.6445694182877, 490.52477303082634, 540.006462480216, 482.03296196246527, 486.5257678759724, 492.31149398516953, 526.1554485196147, 541.9503159089211, 504.67544514169305, 601.86695026203], "walker_flip": [711.5530984754572, 701.7480263926943, 656.3967323938348, 701.6416742909103, 699.7500491515483, 768.8641919709218, 714.003972352532, 732.5250024322979, 733.133804708057, 792.5541376020909]}

I don't know if I reproduce the algorithm with wrong way or fbddpg performs bad without goal space in walker domain?

facebookresearch / controllable_agent

fb_offdiag loss continuously increasing #6