elenafervic / Quantum-Computing

A repository to store the code I used and developed about Quantum Computing during my summer internship (2024).
0 stars 0 forks source link

Shallow vs Deep Copies of Arrays #1

Open elenafervic opened 1 month ago

elenafervic commented 1 month ago

I've had issues in previous coding projects where I assigned arrays or parts of arrays to other arrays, only to find out later that now when I updated one of the arrays it updated the other one.

I believe this is an issue with deep copies of arrays vs shallow copies of arrays. I'm also not completely clear of when a calculation like Variable=A+B results in a new value or array being formed, and when it results in variable referencing a pre-existing data structure.

I have previously built the following understanding: Consider two arrays ArrayA=[ [a1,a2],[a3,a4] ] and ArrayB=[ [b1,b2],[b3,ab4] ] and we do the calculation C=ArrayA+ArrayB. The left hand side is evaluated first, and is stored in a new place. The result is then assigned to the variable C.

But I'm a bit confused, because the calculation on the left hand side doesn't always create a new variable. For example in ArrayA[0]=ArrayB[0], ArrayA[0] is linked to ArrayB[0], making a shallow copy? Intuitively, I understand its because ArrayB[0] has another array within it, but I don't understand why ArrayA[0] isn't populated with copies of the actual values within ArrayB[0], as in the previous example.

I think I need a bit of a refresher on how this works, to make sure my mental image is correct before I start doing coding projects in earnest.

elenafervic commented 1 month ago

I was recommended this resource https://www.geeksforgeeks.org/array-copying-in-python/. I liked the straightforward style of writing, specially how it addressed a misconception I had about "=" operator, and explained clearly the differences between how to copy arrays, "=", deep copy and shallow copy.

I have learnt from this resource that "=" doesn't copy objects, it just creates bindings between a target and an object. In addition, if we do ArrayA=ArrayB, what we are doing is just assigning to ArrayA a reference to ArrayB. I am guessing that in the first example above where we do C=ArrayA+ArrayB, ArrayA+ArrayB doesn't yet exist so we calculate (and create) this new object first and then create a reference between C and the result of "ArrayA+ArrayB".

"=" simply binds a target and an object together, and hence this is still an issue for 1D arrays (as well as 2D+). However, it's not an issue when assigning non-compound objects like number to variables. My edited picture is the following:

image image

In conclusion, the "=" operator creates a link from the variable name on the right hand side (i.e its interpreted literally, see above) to the last whole object referenced on the left hand side (you must follow the references goes from the variable name in LHS until you get to the last whole object).

The resource also covered shallow copies: Makes the array structure (i.e crafts new object), but populates it with references to child object in the original. It doesn't create copies of the child objects themselves (i.e doesn't recurse). It seems like they are a real copy one layer down.

I am still unsure about what is meant exactly by "references to child object":

Overall, I think it's the second of these options, because usually the references are followed as deep as they can be.

I am also still unsure if my mental picture is correct, but it seems to explain the behaviour of the arrays pretty well.

elenafervic commented 1 month ago

From this website https://python.aims.ac.za/pages/cp_mutable_objs.html: I found out that arrays are mutable objects, which are the only type of object that can be linked to other objects when using "=". Immutable objects like integers or strings will have copies created each time, eg A=1 and even if you do B=A, A and B are still independent.

You can check if an object is mutable or if a two objects are references to the same data by using the function id(Array), which returns a long number related to where the data is stored I think. If two variables have the same number for example, you know they are the same thing (although the previous website showed that two shallowly copied arrays will have different identity even if the objects within the array are references to each other, so you have to check all levels).