esa / pygmo2

A Python platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model.
https://esa.github.io/pygmo2/
Mozilla Public License 2.0
434 stars 57 forks source link

population is not iterable #45

Closed MLopez-Ibanez closed 4 years ago

MLopez-Ibanez commented 4 years ago

population should support indexing and iteration so that we can write:

for x, f in pop:
         popNew.set_xf(x,f)
bluescarni commented 4 years ago

This should be fairly easy to achieve. I'll put it on the milestones for the next release.

bluescarni commented 4 years ago

@MLopez-Ibanez There's one caveat with how this has to be implemented. Because in C++ x and f are represented as std::vector, they will have to be translated on the fly into NumPy arrays in Python. Thus, x and f will be copies of the original data, and if you modify them you will not be modifying the data in pop.

Pinging @darioizzo as well... do you think it is still useful to have read-only iteration in this case? My fear is that the behaviour may be surprising because it differs how Python normally behaves.

darioizzo commented 4 years ago

@bluescarni I agree, I think it would be confusing. Its clearer to force the user to go throught he getters.

for x, f in zip(pop.get_x(), pop.get_f()):
         popNew.set_xf(x,f)

Huglier, but it forces a better understanding of the API.

MLopez-Ibanez commented 4 years ago

@MLopez-Ibanez There's one caveat with how this has to be implemented. Because in C++ x and f are represented as std::vector, they will have to be translated on the fly into NumPy arrays in Python. Thus, x and f will be copies of the original data, and if you modify them you will not be modifying the data in pop.

I don't see how this is different from how lists and numpy arrays already work. If you iterate them, you cannot modify the contents via the element:

import numpy as np
x = np.array([1,2,3,4])
for y in x:
    if y == 2:
        y = 4
print(x)

But if you have a large population, iterating with a for-loop by element avoids a large copy of the whole population.

(This is leaving aside that the need for copies is due to an implementation choice by pagmo2, one can share the memory between C++ and Numpy arrays: https://stackoverflow.com/questions/16065183/convert-a-stdvector-to-a-numpy-array-without-copying-data).

bluescarni commented 4 years ago

I don't see how this is different from how lists and numpy arrays already work. If you iterate them, you cannot modify the contents via the element:

import numpy as np
x = np.array([1,2,3,4])
for y in x:
    if y == 2:
        y = 4
print(x)

That has nothing to do with the general behaviour of lists:

In [1]: a = [[1,2],[3,4]]                                                                                                                                        

In [2]: for el in a: 
   ...:     el[0] = 0 
   ...:                                                                                                                                                          

In [3]: a                                                                                                                                                        
Out[3]: [[0, 2], [0, 4]]

In [4]:  

Or with numpy arrays:

In [11]: arr = np.random.random((10,3))                                                                                                                          

In [12]: for a in arr: 
    ...:     a[0] = 0 
    ...:                                                                                                                                                         

In [13]: arr                                                                                                                                                     
Out[13]: 
array([[0.        , 0.77440703, 0.71149927],
       [0.        , 0.21365282, 0.48220925],
       [0.        , 0.14116462, 0.8068819 ],
       [0.        , 0.83026009, 0.57596541],
       [0.        , 0.64133765, 0.374317  ],
       [0.        , 0.66958306, 0.26868335],
       [0.        , 0.92461468, 0.69143304],
       [0.        , 0.6195044 , 0.84812973],
       [0.        , 0.21148926, 0.1190714 ],
       [0.        , 0.12240824, 0.1831873 ]])

In [14]:

Python operates with reference semantics, unless you are dealing with immutables (like the floats in your example).

(This is leaving aside that the need for copies is due to an implementation choice by pagmo2, one can share the memory between C++ and Numpy arrays: https://stackoverflow.com/questions/16065183/convert-a-stdvector-to-a-numpy-array-without-copying-data).

Sharing the memory has never been the problem. The problem with that approach is managing the lifetime of the object, which is not possible unless one retrofits some form of reference counting on the C++ side.

E.g., if I create a numpy array that references some data in a std::vector and that vector is then destroyed on the C++ side, you will have a dangling pointer and a segmentation fault when you try to access the array from python.

MLopez-Ibanez commented 4 years ago

Python operates with reference semantics, unless you are dealing with immutables (like the floats in your example).

I see what you mean now.

Sharing the memory has never been the problem. The problem with that approach is managing the lifetime of the object, which is not possible unless one retrofits some form of reference counting on the C++ side.

Yes, the C++ side would need to handle that as well.

Oh, well, we will have to live with this for now.