eaplatanios / jelly-bean-world

A framework for experimenting with never-ending learning
Apache License 2.0
74 stars 18 forks source link

Some questions #8

Open EvaluationResearch opened 4 years ago

EvaluationResearch commented 4 years ago

Hi, I'm running simulator_ test.py in Python, and found that jbw-render-v0 defines reward_fn by itself. I can regist a jbw env with register() function , but where to set up reward (collect,avoid,explore) and reward schedule(fixed,periodic,random)?

The paper (https://arxiv.org/abs/2002.06306)mentioned that the final results using the reward rate measure to evaluate performance. Where can I see this measurement result?

EvaluationResearch commented 4 years ago

There are multiple. Sh files under script, but I haven't run it successfully so far. I see that reward: collectjellybeans and PPO algorithm are defined in these files.

The result of running in script and Python simulator_test.py, what is the relationship between the two? In Python programs, can't we execute scenarios such as non episodic, non stationary, vision + scene? Do I need to write this Python code myself?

If there is something wrong with my understanding, please correct me in time.

Thank you!

eaplatanios commented 4 years ago

Hi @zhaoyueplc ! All the experiments we present in the paper were written in Swift using the Swift API we provide (including the reward functions, the PPO implementation, and the evaluation code). The Python API does provide the functionality you'd need to implement these reward functions and set up the same experiments but, as you said, you'd have to implement them yourself. Regarding PPO, you could probably try using an existing Python implementation like the ones provided by OpenAI.

EvaluationResearch commented 4 years ago

Hi @eaplatanios, But there are so many parameters in this project that it is difficult to rewrite them in Python for developers who don't know the parameters and their corresponding values.

It is also relatively easy for people with experience in development to change swift, but it is difficult to cross platform. I don't have a Mac environment on hand. I'm trying to execute swift with windows. I find it more difficult and restricted.

Just a suggestion. Would you help to make all functions support Python?

asaparov commented 4 years ago

Thank you for the suggestion. Are you trying to reproduce the experiments in the paper? Or are you trying to run your own RL code using the JBW? You don't need Swift to run your own code, as the Python API is currently fully capable of supporting that. However, if you do want to reproduce the experiments in the paper, then yes, we would need to port the Swift code into Python. Another option you could try is to use Swift from a Linux environment, such as Ubuntu.

EvaluationResearch commented 4 years ago

I'd like to try to do some experiments, and how to evaluate these effects (you have just implemented these evaluation methods). In this case, I need to compare different, very rich experiments. So I still need to understand the experiments you created first, because I haven't understood how to create richer experiments.I'm using swift in Ubuntu..

asaparov commented 4 years ago

Our evaluation was quite simple, and we describe it in our paper: we just plot the "reward rate" over time. Where the reward rate is defined as the total reward in the past N time steps, divided by N. The N is the moving window, and we use N = 100,000 in our experiments.

asaparov commented 4 years ago

In our experiments, the reward depends on the experiment (i.e. the task). For example, if the task is Collect[Jellybean], the agent receives +1 reward whenever it collects a jellybean item. If the task is Avoid[Onion], the agent receives -1 reward whenever it collects an onion item. If the task is Collect[Jellybean] & Avoid[Onion], then both of these conditions are simultaneously true.

But our experiments are intended to be more of a baseline for future work using the JBW. The code (including the Python API) enables you to define your own reward functions, environments, and evaluation metrics.

EvaluationResearch commented 4 years ago

I've been trying to run these swift files with Ubuntu recently, but unfortunately, so far it hasn't worked. I'm not sure whether this project uses swift for tensorflow. Swift for tensorflow doesn't support Linux very well at present. And I'm sure to make some code changes in the future. In fact, there is no percent IDE for swift in Linux yet. It's going to be very difficult for me and unlikely to happen.

If these experiments can only be carried out on swift, it is estimated that many developers have the same problems as me. Of course, it's a friendly thing to develop swift.

So, I am wondering if you have plans to implement relevant experiments on Python or C++?

asaparov commented 4 years ago

Yes the Swift experiments require Swift for Tensorflow. In the README, it is listed under both the Requirements and Using Swift sections that you need Swift for Tensorflow 0.8.

We were able to run our experiments just fine on Ubuntu. What is the error you are running into?

Atom and VS Code are great IDEs with Swift support. I would guess SublimeText also has an addon for Swift.

eaplatanios commented 4 years ago

Another great IDE with Swift support that also works with the Swift for TensorFlow toolchain is CLion by Jetbrains (and it should also work on Ubuntu). For what it's worth, many of our experiments were run on Ubuntu machines and so there should be no problems running our code there.

EvaluationResearch commented 4 years ago

Hi, Sorry, I have been busy with other things recently, but I do have a problem:

zhaoyue@zhaoyue-virtual-machine:~/jelly-bean-world/api/swift/Sources$ swift run -c release JellyBeanWorldExperiments run --reward collectJellyBeans --agent ppo --observation vision --network plain /home/zhaoyue/jelly-bean-world/.build/checkouts/swift-rl/Sources/ReinforcementLearning/Utilities/Protocols.swift:307:30: error: expected 'wrt:' or 'where' in '@differentiable' attribute @differentiable(wrt: self, vjp: _vjpFlattenedBatch) ^ /home/zhaoyue/jelly-bean-world/.build/checkouts/swift-rl/Sources/ReinforcementLearning/Utilities/Protocols.swift:319:30: error: expected 'wrt:' or 'where' in '@differentiable' attribute @differentiable(wrt: self, vjp: _vjpUnflattenedBatch) ^ [2/4] Compiling clibc libc.cerror: fatalError

ubuntu 18.04, Swift for TensorFlow 0.8 toolchain