Open chengyh23 opened 2 years ago
According to the paper, "to provide more learning signal to the agent", wasserstein distance is used as reward. However, in the real-world scene, is there any way of interacting with the environment, so that the agent can get a feedback which reflect the Wasserstein distence between distribution of the belief and the state beneath it?
what does the variable
fisher
in functiongenerate_belief_rep
(e.g., inagnosticmaas_env.py
) mean?