golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.19k stars 17.37k forks source link

proposal: reflect: add DeepCopy #51520

Closed changkun closed 2 years ago

changkun commented 2 years ago
Previous proposal updates As a duality complement of `DeepEqual`, I propose to add a new API for deep copy: ```go package reflect // DeepCopy copies src to dst recursively. // // Two values of identical type are deeply copied if one of the following cases applies. // // Array and slices values are deeply copied, including its elements. // // Struct values are deeply copied for all fields, including exported and unexported. // // Interface values are deeply copied if the underlying type can be deeply copied. // // Map values are deeply copied for all of its key and corresponding values. // // Pointer values are deeply copied for its pointed value. // // One exception is Func value. It is not copiable, and still points to the same function. // // Other values - numbers, bools, strings, and channels - are deeply copied and // have different underlying memory address. func DeepCopy[T any](dst, src T) // or // // DeepCopy panics if src and dst have different types. func DeepCopy(dst, src any) // or // // DeepCopy returns an error if src and dst have different types. func DeepCopy(dst, src any) error ``` DeepCopy may likely be very commonly used. Implement it in reflect package may receive better runtime performance. The frist version seems more preferrable because type parameters permits compile time type checking, but the others might also be optimal. The proposed document of the API is preliminary. --- Update: Based on the discussions until https://github.com/golang/go/issues/51520#issuecomment-1061506887, the documented behavior could be: ```go // DeepCopy copies src to dst recursively. // // Two values of identical type are deeply copied if one of the following // cases apply. // // Array and slices values are deeply copied, including its elements. // // Struct values are deeply copied for all fields, including exported // and unexported. // // Map values are deeply copied for all of its key and corresponding // values. // // Pointer values are deeply copied for their pointed value, and the // pointer points to the deeply copied value. // // Numbers, bools, strings are deeply copied and have different underlying // memory address. // // Interface values are deeply copied if the underlying type can be // deeply copied. // // There are a few exceptions that may result in a deeply copied value not // deeply equal (asserted by DeepEqual(dst, src)) to the source value: // // 1) Func values are still refer to the same function // 2) Chan values are replaced by newly created channels // 3) One-way Chan values (receive or read-only) values are still refer // to the same channel // // Note that for stateful copied values, such as the lock status in // sync.Mutex, or underlying file descriptors in os.File and net.Conn, // are retained but may result in unexpected consequences in follow-up // usage, the caller should clear these values depending on usage context. // // The function panics/returns with an error if // // 1) source and destination values have different types // 2) destination value is reachable from source value ``` Depending on the design choice, the content included and after exceptions could decide on different behaviors, see https://github.com/golang/go/issues/51520#issuecomment-1061453900, and https://github.com/golang/go/issues/51520#issuecomment-1061506887 as examples. The API signature could select one of the following possible options. The main differences between them include two aspects: 1) panic (break the code flow, and enter defer recover) or return an error (user handled directly) 2) destination value in either parameter (more GC friendly) or return value (always allocates) ```go func DeepCopy[T any](dst, src T) error func DeepCopy[T any](dst, src T) func DeepCopy(dst, src any) error func DeepCopy(dst, src any) func DeepCopy[T any](src T) (T, error) func DeepCopy[T any](src T) T func DeepCopy(src any) (any, error) func DeepCopy(src any) error ``` Either version is optimal, and purely depending on the final choice.

Update (until https://github.com/golang/go/issues/51520#issuecomment-1080602090):

I propose to add a new API for deep copy:

// DeepCopy copies src to dst recursively.
//
// Two values of identical type are deeply copied if one of the following
// cases apply.
//
// Numbers, bools, strings are deeply copied and have different underlying
// memory address.
//
// Slice and Array values are deeply copied, including its elements.
//
// Map values are deeply copied for all of its key and corresponding
// values.
//
// Pointer values are deeply copied for their pointed value, and the
// pointer points to the deeply copied value.
//
// Struct values are deeply copied for all fields, including exported
// and unexported.
//
// Interface values are deeply copied if the underlying type can be
// deeply copied.
//
// There are a few exceptions that may result in a deeply copied value not
// deeply equal (asserted by DeepEqual(dst, src)) to the source value:
//
// 1) Func values are still refer to the same function
// 2) Chan values are replaced by newly created channels
// 3) One-way Chan values (receive or read-only) values are still refer
//    to the same channel
//
// Note that while correct uses of DeepCopy do exist, they are not rare.
// The use of DeepCopy often indicates the copying object does not contain
// a singleton or is never meant to be copied, such as sync.Mutex, os.File,
// net.Conn, js.Value, etc. In these cases, the copied value retains the
// memory representations of the source value but may result in unexpected
// consequences in follow-up usage, the caller should clear these values
// depending on their usage context.
func DeepCopy[T any](src T) (dst T)

Here is an implementation for trying out: https://pkg.go.dev/golang.design/x/reflect

mvdan commented 2 years ago

Have you seen previous proposals like https://github.com/golang/go/issues/18371?

changkun commented 2 years ago

Unlike #18371, this proposal suggests deep copy as a package API (as a complement of DeepEqual) other than a built-in function.

mvdan commented 2 years ago

The reflect API was already mentioned there, but in any case, it was rejected due to the functionality being proposed and not what package API it's under.

changkun commented 2 years ago

The previous decision was made based on the interpretation of it should not be built-in. The discussion of as a reflect API seems not to converge to a consensus yet (as how I understand the conversation).

Sure, the deep copy behavior can be implemented as an external package as there are many already. But as an API in reflect package, it can also receive runtime performance improvements (e.g. direct memory copy).

Moreover, it is also a standard implementation to avoid potentially incorrect implementation.

ianlancetaylor commented 2 years ago

How should DeepCopy handle circular data structures?

changkun commented 2 years ago

How should DeepCopy handle circular data structures?

I can imagine two options for as the behavior (if I understood the question correctly):

  1. Panics if the data structure is circular
  2. Handle them. For example:
package main

import (
    "unsafe"
    "reflect"
)

type A struct{ b *B }
type B struct{ a *A }

func main() {
    x := &A{}
    y := &B{}
    x.b = y
    y.a = x
    println(memAddr(x), memAddr(y), memAddr(x.b), memAddr(x.b.a), memAddr(x.b.a.b)) // 1 2 2 1 2

    var z *A
    reflect.DeepCopy(z, x)

    println(memAddr(z), memAddr(z.b), memAddr(z.b.a), memAddr(z.b.a.b)) // 3 4 3 4
}

func memAddr[T any](x *T) uintptr { return uintptr(unsafe.Pointer(x)) }

I would more prefer the second case, hence propose the second option.

DeedleFake commented 2 years ago

I feel like there is little benefit to the output value being an argument to the function. Any, for example, fields in a struct that are interfaces, pointers, maps, etc. are going to require the function to instantiate new values from scratch anyways if it's really going to create a full copy, so how about just func DeepCopy[T any](v T) T? That'll lead to less confusion and fewer potential errors with mismatched/incorrect types. Otherwise you also get questions like

What happens if I copy into a map that has existing data in it? Is the map cleared first? If I copy into a slice, does the slice's old capacity get preserved?

DeedleFake commented 2 years ago

@changkun

How does it handle overlapping cyclic structures? For example:

type S struct {
  Next *S
}

func main() {
  one := &S{}
  two := &S{Next: &S{Next: &S{Next: one}}}
  one.Next = two

  reflect.DeepCopy(one, two)
}

Obviously this would be a really weird case, but something similar could happen accidentally. If the address stored in one is used to store the output, then that means that two's data is getting changed by the copy, potentially resulting in some really strange, implementation-dependent results if not explicitly defined, likely an infinite loop as it continuously generates new structures to try to continue copying further and further downwards.

changkun commented 2 years ago

...so how about just func DeepCopy[T any](v T) T? That'll lead to less confusion and fewer potential errors with mismatched/incorrect types.

I think this is one of the possible optimal versions of deep copy. The pros are as you explained and don't require further document about the status of destination. The cons might be that it always have to allocate on heap.

The proposed versions have pros such as the destination might be stack allocated, hence GC friendly. The potential con could be addressed by documenting more:

The destination value must be a nil or points to a zero
value of source type, and itself not reachable from the
source value. Otherwise, the function panics/return
an error.

Either way is acceptable, and have their unique advantages. Depends on the final decision.

changkun commented 2 years ago
type S struct {
  Next *S
}

func main() {
  one := &S{}
  two := &S{Next: &S{Next: &S{Next: one}}}
  one.Next = two
  reflect.DeepCopy(one, two)
}

This is an interesting case and I would more prefer to panic as a misuse and not handling it.

It is solved by adding this constraint as my last comment elaborated:

The destination value must be a nil or points to a zero
value of source type, and itself not reachable from the
source value. Otherwise, the function panics/return
an error.
rittneje commented 2 years ago

One thing that is problematic is that currently the reflect package disallows setting private (unexported) struct fields. But your proposed version of DeepCopy would have to violate this rule.

Also, there are several interesting cases beyond just func. Like what does it mean to deep copy a mutex? Or a buffered channel? Or a one-way channel? Or a net.Conn? What if a type needs to override whatever the default deep copy logic is?

To me it seems like the de facto standard Clone method is a cleaner solution, because then there is a stable API contract around what it means to copy something.

changkun commented 2 years ago

One thing that is problematic is that currently the reflect package disallows setting private (unexported) struct fields. But your proposed version of DeepCopy would have to violate this rule.

I would argue that deep copy is not about violating setting private struct fields because one cannot use it to set a particular private field. Instead, the API duplicates one to the other completely.

Also, there are several interesting cases beyond just func. Like what does it mean to deep copy a mutex? Or a buffered channel? Or a one-way channel? Or a net.Conn? What if a type needs to override whatever the default deep copy logic is?

These are valid cases, but also make the argument of "why a standard implementation is necessary" stronger. To my senses, I think:

1) deep copy of a mutex results in a new mutex; 2) a buffered channel results in a buffered channel that has the same buffer size, but elements inside the buffer are not copied (intuitively, the elements should be received by the actual receiver, not a copied value) hence empty buffer; 3) one-way channels: a) a copy of a receive-only channel results in the same receive-only channel (intuitively as a new reader), or could result in nil (intuitively slightly matches the behavior in 2)). b) send-only channel results in the same send-only channel (intuitively as a new sender), or could result in nil. 4) a copy of net.Conn results in an invalid connection, can be either zero-valued or nil.

To me it seems like the de facto standard Clone method is a cleaner solution, because then there is a stable API contract around what it means to copy something.

I am not sure which Clone are you referring to and why it is considered as a cleaner de facto standard. Assuming net/http, there are different Clones but it seems that they are all customized object copy behavior, for instance:

func (h Header) Clone() Header
func (r *Request) Clone(ctx context.Context) *Request
func (t *Transport) Clone() *Transport

defining a common interface (as a Copiable constraint) is not possible, there is also no way to define NonCopiable that excludes certain types with the current type set design.

Let's still assume a type set type Copier[T] interface { Clone() T }, a non-pointer value (say type V) may not be subject to this constraint if the Clone signature is (*V).Clone() *V. In this case, only a pointer of V can be used as a type parameter of Copier (Copier[*V]).

davecheney commented 2 years ago

deep copy of a mutex results in a new mutex;

If the original mutex was locked, what is the state of the new mutex?

If the struct being copied contained a os.File, what is the state of the cloned os.File? Does it point a new a file descriptor, or the original file descriptor? What happens when code was using the mutex to serialise access to that file descriptor?

changkun commented 2 years ago

If the original mutex was locked, what is the state of the new mutex?

Either option is acceptable. Setting the new mutex to an unlocked state could be considered as the intuition of the newly created object is not locked; retaining the lock state fits more to have the exact memory representation. The choice between the two is a matter of decision.

If the struct being copied contained a os.File, what is the state of the cloned os.File? Does it point a new a file descriptor, or the original file descriptor? What happens when code was using the mutex to serialise access to that file descriptor?

This is a similar case. Depending on the choice, we could either retain the state of os.File, the same file descriptor. Or just set it as an invalid file descriptor.

Briefly summarize these exceptional cases, it all relates to the problem: what should we do when handling a stateful object, such as channels, net.Conn, os.File, sync.* etc.? To answer this category of questions, we are dealing with the choice of 1) retaining the underlying memory representation; 2) or changing the semantic state of a copying object. As previously expressed, the two possibilities are acceptable objectively because they have their particular purpose in different directions. The actual design choice is a decision matter.

Subjectively, I would argue DeepCopy is more about retaining the underlying memory representation that maintains the pointer's relationship, hence very similar to copying an execution stack (assuming objects on the stack not referring to external things). Therefore, the API only needs to document the exceptional cases for the language-defined types and special objects in the standard library. Then other user-defined stateful objects fall back to those cases. Documenting copying these values may result in unexpected behavior should be acceptable.

One may also argue that changing the state of a copied object, such as mutex, violates the duplication semantics of "copy", but we would also need to be aware that a deep copied value does not necessarily DeepEqual because, according to the document of DeepEqual, "Func values are deeply equal if both are nil; otherwise they are not deeply equal."

rittneje commented 2 years ago

I would argue that deep copy is not about violating setting private struct fields because one cannot use it to set a particular private field. Instead, the API duplicates one to the other completely.

What I mean is that if this were to be implemented, the reflect package would have to do a lot of special casing. It cannot be implemented in terms of the existing operations.

Wit regards to sync.Mutex and os.File and such, unless you are going to special case them all in DeepCopy (which doesn't seem feasible), that strongly implies that the only two choices are (1) literally copying their current state byte-for-byte, or (2), allowing types to define some standard Clone() method to customize this behavior. The latter means that you will end up with something like a Cloneable interface anyway.

I am not sure which Clone are you referring to and why it is considered as a cleaner de facto standard.

The name of the method is ultimately immaterial. The important part is that it be a method, because the behavior of that method is now part of the type's contract. In particular, the returned object is actually valid/usable, whatever that means for the type in question.

EbenezerJesuraj commented 2 years ago

How will this Deep Copy feature of the reflect library of Golang be useful for making runtime dynamic conversion of any structured data-field into (say csv file) for audit - level entrypoint functionality.

changkun commented 2 years ago

What I mean is that if this were to be implemented, the reflect package would have to do a lot of special casing. It cannot be implemented in terms of the existing operations.

I still fail to see why it have to do "a lot" special casing, and how it "cannot" be implemented. Can you name all special cases for discussion? If the decision maker decides to deal with them, the API need to deal with them. But the latest updated proposed behavior is not the case, see the tip of this thread.

I think we might be better ask: When will a developer consider to use an API for deep copy? If they have a struct with tons of net.Conn, mutexes, and os.File, how likely such a value needs a deep copy? What are the primary use cases of this API? How much they can benifit from DeepCopy? https://github.com/golang/go/issues/51520#issuecomment-1062552928 is a reasonable usecase.

Wit regards to sync.Mutex and os.File and such, unless you are going to special case them all in DeepCopy (which doesn't seem feasible), that strongly implies that the only two choices are (1) literally copying their current state byte-for-byte, or (2), allowing types to define some standard Clone() method to customize this behavior. The latter means that you will end up with something like a Cloneable interface anyway.

Sorry, I don't think this argument is strong enough, as it was already argued that a Cloneable interface is not easily possible and already incompatible with existing usage. See https://github.com/golang/go/issues/51520#issuecomment-1061453900

The name of the method is ultimately immaterial. The important part is that it be a method, because the behavior of that method is now part of the type's contract. In particular, the returned object is actually valid/usable, whatever that means for the type in question.

I think this is the suggested behavior based on the discussion until now (copied from the tip of the thread):

// DeepCopy copies src to dst recursively.
//
// Two values of identical type are deeply copied if one of the following
// cases apply.
//
// Array and slices values are deeply copied, including its elements.
//
// Struct values are deeply copied for all fields, including exported
// and unexported.
//
// Map values are deeply copied for all of its key and corresponding
// values.
//
// Pointer values are deeply copied for their pointed value, and the
// pointer points to the deeply copied value.
//
// Numbers, bools, strings are deeply copied and have different underlying
// memory address.
//
// Interface values are deeply copied if the underlying type can be
// deeply copied.
//
// There are a few exceptions that may result in a deeply copied value not
// deeply equal (asserted by DeepEqual(dst, src)) to the source value:
//
// 1) Func values are still refer to the same function
// 2) Chan values are replaced by newly created channels
// 3) One-way Chan values (receive or read-only) values are still refer
//    to the same channel
//
// Note that for stateful copied values, such as the lock status in
// sync.Mutex, or underlying file descriptors in os.File and net.Conn,
// are retained but may result in unexpected consequences in follow-up
// usage, the caller should clear these values depending on usage context.
//
// The function panics/returns with an error if
//
// 1) source and destination values have different types
// 2) destination value is reachable from source value
rittneje commented 2 years ago

@changkun First, I never said it cannot be implemented. I said it cannot be implemented in terms of the existing operations, because the reflect package does not allow setting private fields. So it would have to make some internal exception when it is doing so as part of your proposed method implementation. This also means that the reflect package is the only place that such a method could be implemented.

I think we might be better ask: When will a developer consider to use an API for deep copy? If they have a struct with tons of net.Conn, mutexes, and os.File, how likely such a value needs a deep copy? What are the primary use cases of this API? How much they can benifit from DeepCopy? https://github.com/golang/go/issues/51520#issuecomment-1062552928 is a reasonable usecase.

To be honest, I don't understand what the use case from @EbenezerJesuraj is. I don't see any relationship between DeepCopy and CSVs. I would need something more concrete here.

Sorry, I don't think this argument is strong enough, as it was already argued that a Cloneable interface is not easily possible and already incompatible with existing usage. See https://github.com/golang/go/issues/51520#issuecomment-1061453900

It is merely a statement of fact. Either the reflect package applies a one-size-fits-all solution (meaning it always copies struct fields blindly, regardless of whether it's tls.Config or sync.Mutex or net.TCPConn), or it allows types to customize this behavior via a method. And if it is going to allow such customization, then you have something that looks like an interface, regardless of the specific method name. Those are the only two options I see.

Frankly, I challenge the two claims you made in the original post.

may likely be very commonly used

reflect.DeepCopy is not something I feel would provide much if any value to our code base, especially if the behavior is not customizable in any way. It works well enough for primitive types and simple DTOs, but falters for anything more complex. Personally I feel it should have a contract more like json.Marshal - it will only consider public fields while cloning, and it does not work for "unusual" types like functions and channels.

Implement it in reflect package may receive better runtime performance.

Better runtime performance compared to what? If you mean compared to writing the same code outside the reflect package, then maybe. But compared to a Clone method that doesn't rely on reflection at all, probably not.

changkun commented 2 years ago

reflect.DeepCopy is not something I feel would provide much if any value to our code base, especially if the behavior is not customizable in any way. It works well enough for primitive types and simple DTOs, but falters for anything more complex. Personally I feel it should have a contract more like json.Marshal - it will only consider public fields while cloning, and it does not work for "unusual" types like functions and channels.

I am not sure what your codebase looks like, but the wide Go user code bases certainly have the need, with evidence in https://pkg.go.dev/search?q=deep+copy&m= . Customization seems to fall out of the discussion and not the purpose of this proposal, it might be worth having another proposal.

Implement it in reflect package may receive better runtime performance.

Better runtime performance compared to what? If you mean compared to writing the same code outside the reflect package, then maybe. But compared to a Clone method that doesn't rely on reflection at all, probably not.

Note that the sentence was not a claim but only communicating possibilities. This is based on the fact that the reflect has closer access to runtime itself, such as direct memory copy, collaboration with GC, etc. Discussing the actual performance in the implementation and benchmarking seems too early as the proposal is not yet accepted.

EbenezerJesuraj commented 2 years ago

@changkun First, I never said it cannot be implemented. I said it cannot be implemented in terms of the existing operations, because the reflect package does not allow setting private fields. So it would have to make some internal exception when it is doing so as part of your proposed method implementation. This also means that the reflect package is the only place that such a method could be implemented.

I think we might be better ask: When will a developer consider to use an API for deep copy? If they have a struct with tons of net.Conn, mutexes, and os.File, how likely such a value needs a deep copy? What are the primary use cases of this API? How much they can benifit from DeepCopy? #51520 (comment) is a reasonable usecase.

To be honest, I don't understand what the use case from @EbenezerJesuraj is. I don't see any relationship between DeepCopy and CSVs. I would need something more concrete here.

Sorry, I don't think this argument is strong enough, as it was already argued that a Cloneable interface is not easily possible and already incompatible with existing usage. See #51520 (comment)

It is merely a statement of fact. Either the reflect package applies a one-size-fits-all solution (meaning it always copies struct fields blindly, regardless of whether it's tls.Config or sync.Mutex or net.TCPConn), or it allows types to customize this behavior via a method. And if it is going to allow such customization, then you have something that looks like an interface, regardless of the specific method name. Those are the only two options I see.

Frankly, I challenge the two claims you made in the original post.

may likely be very commonly used

reflect.DeepCopy is not something I feel would provide much if any value to our code base, especially if the behavior is not customizable in any way. It works well enough for primitive types and simple DTOs, but falters for anything more complex. Personally I feel it should have a contract more like json.Marshal - it will only consider public fields while cloning, and it does not work for "unusual" types like functions and channels.

Implement it in reflect package may receive better runtime performance.

Better runtime performance compared to what? If you mean compared to writing the same code outside the reflect package, then maybe. But compared to a Clone method that doesn't rely on reflection at all, probably not.

EJ Reply:

Package reflect implements run-time reflection, allowing a program to manipulate objects with arbitrary types. The typical use is to take a value with static type interface{} and extract its dynamic type information by calling TypeOf, which returns a Type.

CSV's was an example scenario implementing the Deep Copy feature fo the reflect library.

(Say I receive a set of string related data in a structure format, reflect libraries DeepCopy feature should be able to understnad the nature of the data-type(s) if there the structure not only holds strings but a combination of a lot of primitive data-types and our need is to store and view it in an excel sheet format (csv or tsv and so on). Consider the above scenario which is required/performed on a Golang based scaffold using an hexagonal architectural pattern) will the Deep Copy feature of reflect library naturally support this use-cae.

EbenezerJesuraj commented 2 years ago

Auxillary Info:

We can also say that the program knows the meta-data of the different types of data present in the struct without even reading the actual structure/db

EbenezerJesuraj commented 2 years ago

Auxillary Info 2: csv was just a particular use-case scenario. The Storage can be of any means known by the industry. Shallow Copy will only make a pointer point to the same location which considering an endpoint-level changes happening in a business-application is highly infeasible(a project which i am working on currently). So will DeepCopy.reflect support such use-cases.

changkun commented 2 years ago

Speaking of use cases, I can provide one example from my practice. In graphics applications, geometric data are represented by mesh consisting of vertices, edges, faces, etc. Vertices have different attributes, edges can be directional, faces can be polygon-based. Clone/Copy/Duplicate such an object is not easy and requires many developers' effort. But with DeepCopy, one can easily create a copy of a given mesh. This is very helpful for geometry processing cases where caching cascaded meshes are convenient for different kinds of solvers.

This case can generalize to any graph data.

EbenezerJesuraj commented 2 years ago

I can also relate to your example as I also did one college project titled: Polygon Triangulation using Monotone Partitioning using C++ where we have to use DCEL to model the problem but we didnt require a runtime - level library as we already know the data-types where just points in the Hyper-plane.. Hope this proposal of DeepCopy.reflect will naturally support all primitive, if possible know data-types(Basic+Advanced) to store all business grade data.. If possible i can also try to contribute to this proposal if you can add me as a member in the golang organization. I can try to contribute based on my level of expertise

changkun commented 2 years ago

Hope this proposal of DeepCopy.reflect will naturally support all primitive, if possible know data-types(Basic+Advanced) to store all business grade data.. If possible i can also try to contribute to this proposal if you can add me as a member in the golang organization. I can try to contribute based on my level of expertise

Sorry but this is not part of the discussion in this proposal. Refer to GithubAccess for more details.

EbenezerJesuraj commented 2 years ago

Sure, Thanks

rittneje commented 2 years ago

(Say I receive a set of string related data in a structure format, reflect libraries DeepCopy feature should be able to understnad the nature of the data-type(s) if there the structure not only holds strings but a combination of a lot of primitive data-types and our need is to store and view it in an excel sheet format (csv or tsv and so on). Consider the above scenario which is required/performed on a Golang based scaffold using an hexagonal architectural pattern) will the Deep Copy feature of reflect library naturally support this use-cae.

@EbenezerJesuraj Can you clarify how something like DeepCopy helps in this situation? Why can you not just convert the original structure to CSV directly?

In graphics applications, geometric data are represented by mesh consisting of vertices, edges, faces, etc. Vertices have different attributes, edges can be directional, faces can be polygon-based. Clone/Copy/Duplicate such an object is not easy and requires many developers' effort. But with DeepCopy, one can easily create a copy of a given mesh. This is very helpful for geometry processing cases where caching cascaded meshes are convenient for different kinds of solvers.

@changkun I'm not familiar with such applications. For the kinds of structs you would be copying, do they typically have private fields you would also need to copy?

EbenezerJesuraj commented 2 years ago

@rittneje Hi, In a Business-grade application we may have to deal with a large set of applications dynamically at run-time. (Quick intro on Concurrency: Go's Concurrency of executing non-related independent tasks provides very good optimizations.)

We may have to develop different types of applications for different scenarios and one can avoid writing code manually / avoid re-inventing the wheel by developing a generic module like Deep Copy to store the data received from Clients to the Server. If this generic module supports conversion of any data input field (say in the form of struct to any tabular or any other feasible means of storage) then DeepCopy.reflect might look like an ideal choice in my opinion.

Say if we did a Shallow Copy we will have to use pointers and if they are not managed properly can bring a lot of mess into the code and might possibly expose business data to scammers and phishers, which is not preferrable.

changkun commented 2 years ago

@changkun I'm not familiar with such applications. For the kinds of structs you would be copying, do they typically have private fields you would also need to copy?

There are many internal attributes caches that are not exposed.

rsc commented 2 years ago

This is very complicated and not clearly useful without the ability to do significant customization of the result (and therefore changes to the algorithm).

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

changkun commented 2 years ago

This is very complicated and not clearly useful

The metric for measuring usefulness is not yet clear in this conversation. As there are use cases described above, such as a copy of a graph structure. Also, as we already argued, if one uses the search results in pkg.go.dev using keywords "deep copy", the number of occurrence seems to suggest "useful".

without the ability to do significant customization of the result (and therefore changes to the algorithm).

The ability of customization is still not clearly stated as of the purpose of "deep copy". If one speaks of copy constructor, it seems to be something worth for another thread, which potentially involves topics like operator methods. For a pure deep object copy semantics, there might be few existing practices we could refer to:

rsc commented 2 years ago

In most complex data structures, there will be some pointers that are to singletons, which should remain singletons in the result (that is, not copied at all). Making a deep copy of those would be incorrect. That's what I mean by customization of the result.

It's also a very expensive operation. If it existed, people would use it in contexts where it does not apply, so it would be both broken and slow. It's hard to see that this is important enough to add.

Python and JavaScript having it is not by itself an argument in favor.

It also seems like this could plausibly be written as a third-party package.

Finally, there are many ways in which reflect.DeepEqual was a mistake. On balance I personally think probably not, but it was borderline. DeepCopy seems to be more on the 'not worth it' side of the scale than DeepEqual.

EbenezerJesuraj commented 2 years ago

In most complex data structures, there will be some pointers that are to singletons, which should remain singletons in the result (that is, not copied at all). Making a deep copy of those would be incorrect. That's what I mean by customization of the result.

It's also a very expensive operation. If it existed, people would use it in contexts where it does not apply, so it would be both broken and slow. It's hard to see that this is important enough to add.

Python and JavaScript having it is not by itself an argument in favor.

It also seems like this could plausibly be written as a third-party package.

Finally, there are many ways in which reflect.DeepEqual was a mistake. On balance, I personally think probably not, but it was borderline. DeepCopy seems to be more on the 'not worth it' side of the scale than DeepEqual.

I completely understand your concern @rsc, but people (implies programmers (say some) who misuse it or doesn't use it properly shouldn't be an argument for not providing the feature.

reflect.DeepCopy will not be a useful feature for projects with high-school or university level scope but for enterprise/business-grade applications where experienced/seasoned programmers will be using it by default negates the argument.

Coming to the second point, If we want to scale Go as the next upcoming programming language which it will be competing against (say python or C++ , Python having DeepCopy in itself states the "usefulness" of the feature.)

Say, if the priority of Golang (Programing Language) is to be scaled for large-scale business applications then from my point of view I hardly see it as a concern and i couldn't understand why DeepCopy would fall on the 'not worth it' side of the scale..

changkun commented 2 years ago

In most complex data structures, there will be some pointers that are to singletons, which should remain singletons in the result (that is, not copied at all). Making a deep copy of those would be incorrect. That's what I mean by customization of the result.

It is true that some complex data structures have pointers to singletons, but certainly not all of them. We also don't have much evidence on if such a structure really needs a deep copy or not. Instead, The majority (I think) structures without singletons could benefit from having a deep copy ability.

It's also a very expensive operation. If it existed, people would use it in contexts where it does not apply, so it would be both broken and slow. It's hard to see that this is important enough to add.

This is potentially a weak argument. There are enough examples in standard libraries that could be easily misused but still there, such as runtime.SetFinalizer exists pre Go1, and sync.(*Mutex).TryLock lately accepted. I would argue that the reflect package is already considered advanced (I think) and could be dangerous to trigger runtime panic when misused. Not having a reflect.DeepCopy does not mitigate the issue, and having it also doesn't increase too much if the usage is properly documented.

Python and JavaScript having it is not by itself an argument in favor.

This is probably a misunderstanding. To clarify, the argument was not about "Because X,Y have it then we should have it". Instead, it suggests we could use them as references to determine why they had it and what might be the best design for Go to design its own solution.

It also seems like this could plausibly be written as a third-party package.

This was previously argued already. Surely it could be written as a third-party package, but a standard implementation to avoid potentially incorrect implementation. An example in the standard library is cgo.Handle.

Finally, there are many ways in which reflect.DeepEqual was a mistake.

Not entirely sure where do you referring to, but this seems to make the argument of handing a standard implementation a bit more stronger.

On balance I personally think probably not, but it was borderline. DeepCopy seems to be more on the 'not worth it' side of the scale than DeepEqual.

I agree with some potential misuse of such an API, but I would argue it should not be the strongest argument to not having it. Because we also emphasized many potential benefits of having it in a standard library.

ianlancetaylor commented 2 years ago

Let's start without outside of the standard library and see where we go. With the number of questions and issues about this, it seems best to work them out outside of the library where the compatibility guarantees can be weaker. Thanks.

changkun commented 2 years ago

Let's start without outside of the standard library and see where we go. With the number of questions and issues about this, it seems best to work them out outside of the library where the compatibility guarantees can be weaker. Thanks.

Starting outside the standard library may feel like ideal in the beginning, but such action arises more issues to resolve:

  1. Where to put them? x/exp/reflect? x/reflect? x/copy? ...
  2. Losing actual benefits or closer access to the reflect, runtime internals. The outside and inside implementation strategy could be completely different
  3. It feels like APIs under golang.org are still difficult to change as long as it gets a large number of users.
  4. Introducing migrations (although there was an example context)
ianlancetaylor commented 2 years ago

Where to put them? x/exp/reflect? x/reflect? x/copy? ...

github.com/changkun/deepcopy ?

changkun commented 2 years ago

Well, I appreciate the humor but that does not make new progress in conversation towards any direction, which simply repeats the repeatedly argued argumentation.

ianlancetaylor commented 2 years ago

Sorry, that was not intended to be humorous. That was a serious suggestion. The standard library can adopt ideas from anywhere, if we have some evidence that those ideas are sound and widely used.

changkun commented 2 years ago

However the suggestion remains repeatetive. As argued already, we don't need to work on one another deepcopy whereas the pkg.go.dev contains enough adoptable ideas, and they are compiled into this proposal. See:

ianlancetaylor commented 2 years ago

Ah, my apologies.

ianlancetaylor commented 2 years ago

Does the updated proposal correspond to any existing deepcopy package?

changkun commented 2 years ago

Taking one example, in this implementation, there are cases, such as unexported fields, are not determined to be deep copied. These cases are calibrated in this proposal, such as allowing unexported fields, circular data structure, etc.

ianlancetaylor commented 2 years ago

I'm really not trying to be troublesome here. But I honestly think that the best way forward here is going to be to write the version of the function that you think we need. And that implementation can live anywhere. Doing that will make it possible for people to try the package and see what features it needs. (Note that it's possible for such an implementation to copy unexported fields by using the unsafe package.)

Historically the reflect.DeepEqual function was an ongoing source of trouble because it didn't do exactly what many people wanted. For example, #8554, #8600, #12025, #12918, #16531, #20444, #23597, #30066, #38650, #39652, #42265. Our chosen resolution, for better or for worse, was not to introduce a bunch of knobs for DeepEqual; it was to encourage the development of packages that could live outside of the standard library, such as https://github.com/google/go-cmp. My guess is that the same will be appropriate for DeepCopy.

changkun commented 2 years ago

[...] that implementation can live anywhere. Doing that will make it possible for people to try the package and see what features it needs. (Note that it's possible for such an implementation to copy unexported fields by using the unsafe package.)

It is clever to frame the situation like this, but such a situation is essentially the same compared to "a standard implementation to avoid potentially incorrect implementation."

The referenced implementation observed from pkg.go.dev receives high number of imports, but essentially considered as incorrect because it does not resolve, such as circular data structure and remains problematic in some use cases.

Because of those undocumented/unexpected/mismatched/unknown behaviors in a third party implementation, unless one closely looks into the implementation, it is difficult to infer what is the cause. This is a bit away from the language goal of programmer efficiency optimized. Indeed, it is nice that people would try different possibilities, but this is better suited for exploration purposes when solution space isn't bounded.

Historically the reflect.DeepEqual function was an ongoing source of trouble because it didn't do exactly what many people wanted. For example, #8554, #8600, #12025, #12918, #16531, #20444, #23597, #30066, #38650, #39652, #42265. Our chosen resolution, for better or for worse, was not to introduce a bunch of knobs for DeepEqual; it was to encourage the development of packages that could live outside of the standard library, such as https://github.com/google/go-cmp. My guess is that the same will be appropriate for DeepCopy.

DeepEqual has very different usage cases compared to DeepCopy, but let's still look closely at those "unexpected" cases. A summarization:

  1. Funcs are equal if they are nil. (#8554)
  2. Empty slice/map are not equal to nil slice/map (#12025, #12918, #16531, #38650, #42265)
  3. Request a specific feature extension about what is not equal (#8600, #39652), additional Equal method convention (#20444), additional predicates (#30066)
  4. Clearly flawed usage (#23597)

Quoting https://github.com/golang/go/issues/12025#issuecomment-128000031:

[...] it's pretty important that DeepEqual follow the same rules as the language.

Those unexpected cases in the first and second categories are that the language spec is defined unexpectedly and takes the blame. But we can't change that already.

In the third category, people ask for a specific extension feature rather than an issue with the API itself. The API design by itself is sound. Without those features, we can't ignore the widely practical cases such as using DeepEqual to check testing results. Using third-party dependency is well suited for people asking for additional features but not suited for users who need a basic but robust API. Besides, dependency is a dependency. If one can have it directly from std, why seek an (uncontrolled) dependency?

The fourth category, clearly flawed usage, should not be included in the discussion anyway.

To conclude, I think the argument is somewhat like "we did not do a 'good' job in A, so we are very likely and should not do B." By revisiting these mentioned past examples in retrospect, none of them are particularly applicable to this proposal and not strong enough (I think) for rejecting this proposal of having a DeepCopy (that subjects to the language spec).

ianlancetaylor commented 2 years ago

such a situation is essentially the same compared to "a standard implementation to avoid potentially incorrect implementation."

I don't agree. The doc comment in the initial post has a couple of paragraphs starting with "There are a few exceptions." It's not obvious to me that the choices made in that doc comment are correct. If we make certain choices in the standard library, we may make the wrong choices. I think that we need experience with an implementation to understand what the right choices may be.

Further, I expect that different people will want different choices there, and for other uses of DeepCopy. Again we need experience to understand what choices people want.

To conclude, I think the argument is somewhat like "we did not do a 'good' job in A, so we are very likely and should not do B."

Yes, pretty much. And I think that argument is quite strong when A and B are very similar. It is at least a reason to be very cautious.

changkun commented 2 years ago

It's not obvious to me that the choices made in that doc comment are correct. If we make certain choices in the standard library, we may make the wrong choices. I think that we need experience with an implementation to understand what the right choices may be.

If this is the argument then there are too many examples in the language against this argumentation. The recent generics is a big one of them. All of us have no experience and do not fully understand if a late decision was the right choice.

The proposed exceptions aggregate the discussions in this thread, combining with the initial proposal, which is already a step forward to understand what the right choices could be.

Further, I expect that different people will want different choices there, and for other uses of DeepCopy. Again we need experience to understand what choices people want.

I see it differently. Until now, we have only seen two groups of people: 1) people who object to having it; 2) people who are expecting to have it.

Yes, pretty much. And I think that argument is quite strong when A and B are very similar. It is at least a reason to be very cautious.

That's right, it could be strong when A and B are very similar. As per discussed in https://github.com/golang/go/issues/51520#issuecomment-1079645855, they are quite different in terms of use and potential expected behaviors. The only similarity seems to be the Deep part in their name.

atdiar commented 2 years ago

Should deeply copied values also be deeply equal?

changkun commented 2 years ago

Should deeply copied values also be deeply equal?

// There are a few exceptions that may result in a deeply copied value not
// deeply equal (asserted by DeepEqual(dst, src)) to the source value: