Some questions about RL_application_env.py

Tangsuuu commented 2 years ago

Hi, I find self.check_activity() will change the size of self.action_space = spaces.Box() in RL_application_env.py init(). Will it cause wrong action_space size init in the model?

H2SO4T commented 2 years ago

Hi, no it should not give any kind of issue. In case you encountered a bug, please give me your configuration and the apk that crashes.

Tangsuuu commented 2 years ago

It didn't crash. I my test case of BookyMcBookface.apk.
self.action_space = spaces.Box(low=numpy.array([0, 0, 0]),high=numpy.array([self.ACTION_SPACE, len(self.strings) - 1, 1]), dtype=numpy.int64) self.ACTION_SPACE is 30. But after self.check_activity() in init(), self.action_space.high[0] = len(self.views) + self.shift is seted. So self.action_space is changed to spaces.Box(low=numpy.array([0, 0, 0]),high=numpy.array([4, len(self.strings) - 1, 1]). Then self.action_space is used to set the action_space in SAC model. It will cause wrong ACTION_SPACE in SAC model. The right ACTION_SPACE should be 30 in your parameter.

H2SO4T commented 2 years ago

It is normal to resize the action space during the exploration. It is resized to match the number of widgets that ARES can interact with. In fact, the number of widgets that an activity has may vary over time. However, the number also changes between activities.

Let us say that the ACTION_SPACE of dimension 30 is the upper limit of widgets that ARES can manage.

Tangsuuu commented 2 years ago

I may didn't understand the init() of SAC model. I just think if the SAC model is initialized with ACTION_SPACE = 4, it will not predict the action_number[0] more than 4. Will the upper limit of predicted action_number[0] in SAC model changed according to the resize the action space during the exploration?

H2SO4T commented 2 years ago

If you look at line 173 in RL_application_env.py I'm setting the action_space to the "maximum_dimension" of 30. Then at line 178, I call self.check_activity() that actually resizes the action_space to the correct dimension. At last self.check_activity() is recalled each time that the GUI changes.

Tangsuuu commented 2 years ago

Thanks for your reply. I konw the ACTION_SPACE is changed with the GUI changes. But the size of prediction of SAC model didn't changed with action_space resizes in self.check_activity(). In my test, the SAC model is initialized with the ACTION_SPACE = 4 ( line 178 self.check_activity() ). Then self.check_activity() is recalled each time that the GUI changes than change the ACTION_SPACE in RL_application_env, but it don't change the ACTION_SPACE in SAC model. The prediction of the model is fixed to ACTION_SPACE = 4 in initialization.

I didn't talk the init() in RL_application_env.py. The init() in SAC (from stable_baselines3 import SAC) will fixed with ACTION_SPACE = 4.

H2SO4T commented 2 years ago

Yes, we can not change the dimension of the output of the ML model (i.e., the action space of the ML model), the library does not allow this kind of modification. I only use the resized dimension to check if the action generated by the ML model goes out of bound (a non-valid action) or if it is valid for the application.

Tangsuuu commented 2 years ago

So self.check_activity()(at line 178) will set the wrong action space for the ML model ? because it change the self.action_space = spaces.Box() to the action space of the initial GUI of app and ML model initialized with the changed action_space.

H2SO4T commented 2 years ago

The ML model checks the dimension only at the beginning, and then it will never use the action_space variable anymore. This means that the output of the ML model is always in the range 0:30. But if you have 3 buttons in Activity Something.Main, it is not useful to generate a value greater than 2 (i.e., 0, 1, 2).
So I'm just using the action_space variable to save the actual action space dimension.

Tangsuuu commented 2 years ago

In my test, the ML model checks the dimension at the beginning and the action_space(30) is changed to 4 before ML model init(). The self.check_activity()(at line 178) in RL_application_env.py init() will work before ML model init(). It is right for Activity Something.Main, but it didn't work for other activities (which buttons are more than 4).

H2SO4T commented 2 years ago

I'm sorry, now I got the point. I'll fix it asap.

Tangsuuu commented 2 years ago

Thank you so much for your patient responses.

H2SO4T commented 2 years ago

I did a preliminary check and it seems to work, the model generates values between 0 and 30. Here is my console output. ` 2022-02-14 15:03:48.583 | DEBUG | rl_interaction.RL_application_env:init:127 - apps/Calculator.apk START

----> line 173 [30, 39, 1]

----> line 178 [4, 39, 1]

Starting training from zero Using cpu device Wrapping the env in a DummyVecEnv. 2022-02-14 15:03:59.531 | DEBUG | rl_interaction.RL_application_env:reset:316 - <--- EPISODE RESET ---> Action: [ 4. 12. 0.] Action: [ 3. 28. 0.] 2022-02-14 15:04:01.700 | DEBUG | rl_interaction.RL_application_env:step2:236 - action: android:id/button1 Activity: calculator.innovit.com.calculatrice.MainActivity 2022-02-14 15:04:14.632 | DEBUG | rl_interaction.RL_application_env:step2:236 - action: abgc Activity: calculator.innovit.com.calculatrice.MainActivity

--- > Action: [24. 23. 1.] 2022-02-14 15:04:25.761 | DEBUG | rl_interaction.RL_application_env:step2:236 - action: Open navigation drawer Activity: calculator.innovit.com.calculatrice.MainActivity Action: [30. 39. 0.] 2022-02-14 15:04:38.336 | DEBUG | rl_interaction.RL_application_env:step2:236 - action: calculator.innovit.com.calculatrice.MainActivity.android.widget.Button.0 Activity: calculator.innovit.com.calculatrice.MainActivity Action: [18. 11. 1.] Action: [8. 0. 1.] 2022-02-14 15:04:48.083 | DEBUG | rl_interaction.RL_application_env:step2:236 - action: calculator.innovit.com.calculatrice.MainActivity.android.widget.Button.9 Activity: calculator.innovit.com.calculatrice.MainActivity Action: [21. 27. 1.] `

As you can see , the first call sets the dimension at line 173 and 178 shrinks it down to 4. But when you print the action space just after the step() function you get also numbers bigger than 4, so this means that the ML model has the maximum dim = 30.

H2SO4T commented 2 years ago

However, I will investigate the problem in the next days and I will keep you posted.

Tangsuuu commented 2 years ago

Hi, I also imitated Fate and creat a virutal_env. The step() function is: if action_number > 10 and action_number <20: logger.warning(100.0) return self.observation, numpy.array([100.0]), numpy.array(False), {} else: logger.warning(-100.0) return self.observation, numpy.array([-100.0]), numpy.array(False), {} But the ML model don't learn to predict action_number between 10 and 20. Do you know why? It bothered me for a long time.

The complete code is： test.py from rl_interaction.utils.wrapper import TimeFeatureWrapper from RL_application_env_fate import RLApplicationEnvFate app = RLApplicationEnvFate() env = TimeFeatureWrapper(app) model = SAC(MlpPolicy, env, learning_rate=3e-4, learning_starts=2000) model.learn(total_timesteps=100000) print("model end!!!")

RL_application_env_fate.py from gym import Env import os import numpy from loguru import logger from gym import spaces class RLApplicationEnvFate(Env): def init(self):

    self.log_dir = './fate_logs'
    if not os.path.exists(self.log_dir):
        os.mkdir(self.log_dir)

    self.action_num_logger_id = logger.add(os.path.join(self.log_dir, 'action_num_logger.log'), format="{level} {message}",
                                    filter=lambda record: record["level"].name == "WARNING")
    self.action_space = spaces.Box(low=numpy.array([0]),
                                   high=numpy.array([30]),
                                   dtype=numpy.int64)
    self.observation_space = spaces.Box(low=0, high=1, shape=(300,), dtype=numpy.int32)
    self.observation = numpy.ones(300)

@logger.catch()
def step(self, action_number):
    #action_number = action_number.astype(int)
    logger.warning(action_number)
    if action_number > 10 and action_number <20:
        logger.warning(100.0)
        return self.observation, numpy.array([100.0]), numpy.array(False), {}
    else:
        logger.warning(-100.0)
        return self.observation, numpy.array([-100.0]), numpy.array(False), {}

def reset(self):
    return self.observation

H2SO4T commented 2 years ago

During the training phase, some actions are always taken randomly; to have deterministic behavior, you should complete a training phase with a very high number of timesteps, not 100000 like in your case.

H2SO4T / ARES

Some questions about RL_application_env.py #8