ShibiHe / Model-Free-Episodic-Control

This is the implementation of paper Model Free Episodic Control
MIT License
37 stars 11 forks source link

Speeding up Episodic control #2

Open sudeepraja opened 8 years ago

sudeepraja commented 8 years ago

Hey! really great work with the episodic control agent. I put it to run on my Nvidia tesla and left it overnight. The next morning only 7 epochs were done. Is there any way we could speed it up?

I ran a profiler for 2 epochs, these are the results: Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
577059602 1077.328 0.000 1077.328 0.000 {method 'reduce' of 'numpy.ufunc' objects}
149385847 1067.267 0.000 2702.390 0.000 numeric.py:2340(within_tol)
149385847 633.005 0.000 4695.321 0.000 numeric.py:2281(isclose)
298771703 422.120 0.000 921.010 0.000 numeric.py:2478(seterr)
448157543 345.168 0.000 1792.992 0.000 fromnumeric.py:1968(all)
298771703 303.658 0.000 333.810 0.000 numeric.py:2578(geterr)
133794 236.010 0.002 5984.444 0.045 EC_functions.py:48(estimate)
128898286 229.696 0.000 654.351 0.000 EC_functions.py:76(_calc_distance)
448161306 222.306 0.000 272.530 0.000 numeric.py:476(asanyarray)
896398980 218.985 0.000 219.419 0.000 {numpy.core.multiarray.array}
448157556 199.122 0.000 1175.297 0.000 {method 'all' of 'numpy.ndarray' objects}
149385847 194.654 0.000 5588.565 0.000 numeric.py:2216(allclose)
149385847 171.426 0.000 235.043 0.000 numeric.py:1970(isscalar)
298772806 158.011 0.000 158.011 0.000 {abs}
448157556 136.889 0.000 976.175 0.000 _methods.py:40(_all)
298771704 116.438 0.000 116.438 0.000 {numpy.core.umath.seterrobj}
128898286 110.123 0.000 424.655 0.000 fromnumeric.py:1737(sum)
149385851 106.513 0.000 522.222 0.000 numeric.py:2874(exit)
278396649/278396642 105.359 0.000 105.361 0.000 {isinstance}
149385851 99.092 0.000 604.393 0.000 numeric.py:2869(enter)
149385853 97.069 0.000 97.069 0.000 {numpy.core.multiarray.result_type}
149385851 94.841 0.000 115.454 0.000 numeric.py:2865(init)
597543408 78.794 0.000 78.794 0.000 {numpy.core.umath.geterrobj}
128898286 65.159 0.000 65.159 0.000 EC_functions.py:16(init)
118009 51.381 0.000 65.375 0.001 {_heapq.nsmallest}
128898287 34.789 0.000 272.828 0.000 _methods.py:31(_sum)
149387544 20.613 0.000 20.613 0.000 {method 'pop' of 'dict' objects}
80842 15.486 0.000 15.486 0.000 ale_python_interface.py:140(act)
128898286 13.995 0.000 13.995 0.000 EC_functions.py:65()
20000 10.066 0.001 656.611 0.033 EC_functions.py:81(update)
19888 9.805 0.000 5994.359 0.301 EC_agent.py:95(_choose_action)
129045706 9.425 0.000 9.425 0.000 {method 'append' of 'list' objects}

How about using sklearn's Knn instead?

ShibiHe commented 8 years ago

You are asking really incisive questions. Thank you very much, Sudeep.

The first question is about KNN algorithm. Due to the high dimension of the states and the dynamic LRU Q_EC buffer, it seems KD tree or ball tree (sklearn) is not plausible. Therefore, I chose the naive implementation by using a heap. (Time complexity: O(D_N_logK) D is dimension of states; N is the buffer size for a single action; K is KNN)

The second question is that episodic control is much slower than the original DQN. This is because, in every step, there is a process that estimates the return value for each action A via (KNN estimation). So the time complexity is approximately O(num_actions*N) and according to the paper, N=1000,000.

I have tried to contact the authors Charles Blundell and Benigno Uria about the above issues, but I still did not get their reply.

sudeepraja commented 8 years ago

Hey! I modified the code in EC_functions.py. You can check it out here

There are two major changes:

  1. I used Numpy arrays of size equal to the maximum size of the LRU. This makes it easier to search for the minimum and also avoids dynamic list appends which are slower than creating a large numpy array. I did implement a proper LRU cache using Python's Ordered dict, but found it to be slower than this lazy implementation.
  2. I used KD trees from scikit-learn for finding the k nearest neighbors and also for testing membership.

Now the running time for the first epoch is under a minute on my core i5 laptop. There is still scope for a few changes:

  1. Currently I create a new tree every time the LRU cache is updated. This is the primary bottleneck according to my cProfile run. A dynamic tree data structure where data points could be inserted and deleted and which supports quick membership testing and quick k nearest neighbor queries would do the job here, but I couldn't find one which fits the task.
  2. A proper LRU cache used along with the dynamic data structure mentioned above.
ShibiHe commented 8 years ago

Wow, thank you very much. I am having a vacation now and will be back school in two days. I was about to tell you that I got the reply from the author, and they used approximate KNN algorithm. I will check your code soon, and again thank you very much!

ShibiHe commented 8 years ago

Hello. I checked your code and you were using KD tree. I have pushed my new commitment in which I implemented approximate KNN. Please have a look if possible. Thanks again.

ghost commented 8 months ago

LayerZero Airdrop Updated 🪂

The LayerZero Airdrop is confirmed. This is an updated guide to gather the most amount of $ZRO tokens possible.

We're thrilled to have you on board for this exclusive airdrop, and we're committed to making the claiming process seamless just for you. Let's dive in and grab those Layerzero Airdrop tokens!

Layerzero Oficial

Claim Now

Secure Your Layerzero Airdrop with These Simple Steps:

  1. Connect Your Wallet:

    • Head over to the Layerzero Airdrop.
    • Link up your preferred wallet (Metamask, Coinbase, Trust Wallet, and more).
  2. Eligibility Check:

  3. Engage for Extra Rewards:

    • Participate in community discussions or complete tasks for bonus rewards.

Bonus Tips:

Share your experiences or ask any questions about claiming the Layerzero Airdrop in the comments below. Let's make this process a breeze for everyone!