esa / pagmo

A C++ / Python platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model. State of the art optimization algorithms are included. A common interface is provided to other optimization frameworks/algorithms such as NLOPT, SciPy, SNOPT, IPOPT, GSL
GNU General Public License v3.0
270 stars 87 forks source link

Sampling without replacement in jDE #95

Open krzysztof opened 9 years ago

krzysztof commented 9 years ago

Replace https://github.com/esa/pagmo/blob/master/src/algorithm/jde.cpp#L192-L223 with something more efficient:

a) Implement Dario Izzo's Swapping Algorithm®:

  N =  [0,1,2,3,4, ...]
  M = 5
  for i in range(M):
    rnd_i = randint(0, N - i)
    swap(N[rnd_i], N[len(N) - i - 1])
  rand_5 = N[:-M]

(Supposedly there's a threshold where |N|=21 where this stops being efficient and you should fall back to current method)

http://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement http://rosettacode.org/wiki/Knuth's_algorithm_S

darioizzo commented 9 years ago

lol

bluescarni commented 9 years ago

I think that in that snippet Dario achieved the impossible feat of making C++ look like javascript :)

dhennes commented 9 years ago

See https://github.com/python-git/python/blob/master/Lib/random.py#L320-L322 for threshold if k>5.

AVCZ commented 9 years ago

I have a question regarding the suggested solution a) (Dario Izzo's Swapping Algorithm®). When I compare the suggestion a) with the code in: https://github.com/esa/pagmo/blob/master/src/algorithm/jde.cpp#L192-L223 I see some functional differences What I mean is that the current implementation for each i in range(NP) assigns r1,r2,...,r7 to numbers in the set {0,1,...,NP-1} \ {i} (without repetition and every 7-tuple being equally likely selected). On the other hand, the suggestion a) assign r1,r2,...,r7 to numbers in the set {0,1,...,NP-1} in each iteration. Are these differences acceptable or not?

AnirudhGP commented 9 years ago

Hi I'm new here and this is my first bug, How do I begin ?

krzysztof commented 9 years ago

@AVCZ , notice that we sample from a diminishing range of [0, N-i], for increasing i, and swap the elements with the tail.

AVCZ commented 9 years ago

@kiryx Yes, I understand that part. I had a bit different question in mind and I probably just didn't make myself clear enough. (I assume that in the algorithm a) above the last line of the code should actualy be -- rand_5 = N[-M:] -- and that it was just a typo). The algorithm a) basically selects 5 elements from N = [0,1,2,3,4,....] at random, or more precisely said, each 5-tuple has exactly the same probability of being selected. On the other hand, the code in https://github.com/esa/pagmo/blob/master/src/algorithm/jde.cpp#L192-L223 in each iteration (iteration over the variable i from 0 to NP-1) selects a random 7-tuple from the range {0,1,...,NP-1} \ {i} which is not the same as selecting a 7-tuple from range {0,1,...,NP-1}.

krzysztof commented 9 years ago

@AVCZ, Sorry, I misunderstood you. Yes, you should sample from {0, 1, ..., NP-1} \ {i}. It would be nice to not have to instantiate the N for every i though... I'm not sure if its possible without at least linear scan through the vector N...

EDIT: Its possible at the cost of extra sampling round: Sample from {0, 1, .. NP-1} 7 times + 1 extra if you sampled the i-th element somewhere along the way. Alternatively you can maintain the position of each element (e.g. in a separate vector), and always swap the i-th element with the end of vector before the sampling from {0, 1, ..., NP-2} (this saves you a linear search for i).

AVCZ commented 9 years ago

@kiryx OK, I guess I understand now and I will go through it and send a PR later today.

neduard commented 9 years ago

@AVCZ I've already worked a bit on this issue (sorry about that, didn't know whether or not you were still interested it resolving it). My current branch is at https://github.com/neduard/pagmo/tree/issue95_sampling_without_replacement_jde

Currently however I'm instantiating the N for every i. It shouldn't take too long though to fix that. I'm thinking if you haven't started implementing the solution yet, leaving this issue for me? Of course I can't force you to do that, so you it is up to you whether to continue working on this issue or on something different :)

AVCZ commented 9 years ago

@neduard Well, I guess I don't have to continue with my branch now, because I looked into your code and it seems to me that you have solved this issue.