PacktPublishing / Building-Data-Science-Applications-with-FastAPI

Building Data Science Applications with FastAPI, Published by Packt
MIT License
306 stars 157 forks source link

Chapter 11 ndarray.copy is not deep copy #10

Closed gitgithan closed 2 years ago

gitgithan commented 2 years ago

Page 320 says "If you need to deep copy the values" According to https://numpy.org/doc/stable/reference/generated/numpy.copy.html:

To ensure all elements within an object array are copied, use copy.deepcopy:

frankie567 commented 2 years ago

There is a small nuance here.

If you manipulate numpy arrays with scalar values, like numbers, numpy.copy is enough and it will create a new array in memory with those values.

However, if you numpy array contains arbitrary Python objects (notice dtype=object), it won't copy them; it'll only pass their reference. IMO, you shouldn't worry too much about this, this is a very rare case (in general, you should probably avoid to store Python objects into a numpy array).

gitgithan commented 2 years ago

Sorry I left out the second half of that sentence, what I mean't to say was is the full sentence really correct? The second half says you just have to use the copy method on the array. I thought the copy method was a shallow copy, so this contradicts the first half of sentence saying it is a deep copy?

frankie567 commented 2 years ago

Well, maybe we could argue on the meaning of deep copy here. What I meant in this part is that if you just do:

v = m[::, 3:]

v here is a view (or shallow copy). It means that the underlying value are shared between v and m: if you change a value in v, it'll change in m also.

If you need to have a real copy of the values in the matrix (this is what I mean by deep copy here), you need to use the .copy method:

v = m[::, 3:].copy()
gitgithan commented 2 years ago

This was what I mean by copy() being shallow. I just discovered numpy.copy() behaves differently when type is int vs object

import copy 
import numpy as np

a_object_shallow = np.array([[1],[1,2]],dtype=object)
b_object_shallow = a_object_shallow.copy()
b_object_shallow[1][1] *= 10
print(repr(a_object_shallow))  # a_object_shallow[1][1] changed: array([list([1]), list([1, 20])], dtype=object)

a_object_deep = np.array([[1],[1,2]],dtype=object)
b_object_deep = copy.deepcopy(a_object_deep)
b_object_deep[1][1] *= 10
print(repr(a_object_deep))  # a_object_deep[1][1] no change: array([list([1]), list([1, 2])], dtype=object)

a_int = np.array([[1,2],[1,2]],dtype='int64')
b_int = a_int.copy()
b_int[1][1] *= 10
print(repr(a_int))  # a_int[1][1] no change: array([[1, 2],[1, 2]])
frankie567 commented 2 years ago

Yes, that's the point of the last example you mentioned in the numpy docs.

But as I said before, storing arbitrary objects in a numpy matrix is a quite rare case and should probably be avoided.