find a way to optimize AI loading times

evilsocket commented 4 years ago

TensorFlow takes minutes to import on a Raspberry Pi Zero W and that's probably because of the huge .so file with native primitives it has to load, among other things. Given the nature of the project, that stuff is imported only once, so caching it in memory wouldn't speed things up. Switching frameworks is not feasible, unless we have the same exact features (unlikely given that stable-baselines is TF based). For instance, there's no stable-baselines port for TF-lite.

evilsocket commented 4 years ago

@caquino this is a nice hacktoberfest entry

caquino commented 4 years ago

:+1:

araffin commented 4 years ago

Switching frameworks is not feasible, unless we have the same exact features (unlikely given that stable-baselines is TF based).

I think the two main things that takes time:

importing tensorflow (I don't know how it is for tf 2.0 but the support is planned: https://github.com/hill-a/stable-baselines/issues/366), there will be also a pytorch version in the future (only internal for now)
creating and compiling the network: in your case, you are creating a lstm so it takes even more time. I don't know how much time-dependency there is in this task but I would recommend you to check out observations stacking (cf VecFrameStack) + using a MLP instead of using a recurrent net. I would also recommend you to give PPO2 a try (with clip_vf_range=None) it performs usually better than A2C (anyway, you will need to tune some params, you can take a look at the rl zoo where hyperparameter optimization is included)

EDIT: Regarding issue #49, CMA-ES (which has a nice python package) is usually a good start

evilsocket commented 4 years ago

@araffin thanks for your feedback!

Yep TF alone takes a lot, mostly due to a 130MB .so file with all the native code.
The LSTM seemed to perform way better than MLP alone, it tends to generate useful policies way faster, that is why the choice. As long as I know, if I don't use VecFrameStack an exception is thrown. Will check PPO2, thanks a lot!

For #49 I already have my own implementation, but thanks :D

d3sm0 commented 4 years ago

Importing stable baseline, all old tensorflow (works on v1 not v2) which loads all the tf graph in memory the first time, instead of compiling at run time, makes way to heavy.

Maybe one can rewrite a2c tuned for this application and use (not sure if you are already doing this) a compiled version for rasberry. This should reduce loading time quite a bit.

As a side: LSTM is computationally heavy to run (complexity is on the size of the features) and it make sense if you have long time dependencies in your time-series. Is this the case? One can try to use 1d convolution (complexity is on number of samples) which are more suitable for time-series kind of signal.

evilsocket commented 4 years ago

As a side: LSTM is computationally heavy to run (complexity is on the size of the features) and it make sense if you have long time dependencies in your time-series. Is this the case?

yes

evilsocket / pwnagotchi

find a way to optimize AI loading times #166