georgysavva / scany

Library for scanning data from a database into Go structs and more
MIT License
1.24k stars 67 forks source link

Scanning into a recursive struct #57

Open n-e opened 3 years ago

n-e commented 3 years ago

Hi,

I'd like to scan into a recursive struct:

type Node struct {
    ID       string
    Parent   *Node
}
var nodes []*Node
err := sqlscan.Select(ctx, conns.PgClient, &nodes, `select id, null as parent from nodes where parent_id is null`)

This causes an infinite loop as scany introspects struct recursively.

Any tips for achieving that? I might be missing something obvious since I'm new to Go and scany. Ideally I'd like not to touch the Node struct definition since in my real-life code it is auto-generated. Also I'd like to avoid using db:"-" on Parent since I might want to get the immediate parent with a join.

Thanks!

georgysavva commented 3 years ago

Thank you for opening this issue and discovering such an edge case! Scany indeed iterates all fields recursively in getColumnToFieldIndexMap() func and gets into an infinite loop with such a struct.

To solve that issue we can introduce a threshold like 2 or 3: how many recursive loops are allowed. When we do the broad-first traversal we need to count the number of parents that have the same type as the current element and ensure that it is not bigger when the threshold, otherwise we need to skip that element and don't propagate further.

Let me know if that solution solves your problem or feel free to propose how you see it.

n-e commented 3 years ago

Hi,

Thanks for your answer. Indeed that would solve my problem!

CyganFx commented 1 year ago

hi, is there any development on this issue? ran into the same problem today

georgysavva commented 1 year ago

Hi @CyganFx, there was an attempt to add this feature to scany: https://github.com/georgysavva/scany/pull/60, but the PR got inactive. I am not currently working on this feature. If you are interested in helping, I would gladly accept a PR from you.

anton7r commented 1 year ago

getColumnToFieldIndexMap() could alternatively return map[reflect.Type]map[string]int. And next make map[string]int[] in the structScan() method while iterating through the column dbTags.

The approach would be fairly complicated when we compare to limiting the recursion. But it would be fully cacheable and multithreadable with sync.Map.

sylr commented 1 year ago

Ran into this issue as well.

georgysavva commented 1 year ago

Thanks for the feedback, guys. Would any of you like to tackle this?

anton7r commented 1 year ago

Could try to tackle this issue in the near future. sqlx/reflectx/reflect.go could be used as a point of reference.

anton7r commented 1 year ago

Should we implement caching of getColumnToFieldIndexMap() at the same time?

sync.Mapwould work out of the box fairly well, but it doesn't scale as well as some third party synchronized maps. puzpuzpuz/xsync.MapOf would be better at scalability since it splits the map in to buckets. And the buckets themselves have the lock instead of sync.Map having a lock on the entire map. But third party map implementations can't directly hash reflect.Type and to achieve native performance with the third party map implementation requires few hacks with unsafe.Pointers shown in this blog post https://www.dolthub.com/blog/2022-12-19-maphash/

georgysavva commented 1 year ago

@anton7r thank you for deciding to tackle this issue. I appreciate that.

Yes, you are also welcome to add caching for the reflection map. Here is an issue requesting exactly that: https://github.com/georgysavva/scany/issues/25. Just do it as a separate PR because the feature requested in this PR and map caching aren't related.

For caching, I would go with the default sync.Map implementation or just the native map[..]... protected by a sync.RWMutex{}. A single global lock with no further optimization is okay because the only time the lock is activated is when we write to the cache map, which happens only when a new type definition is being used with scany. And since the number of types in an application is finite, I expected most of them to get cached when the application starts serving incoming requests. After that, all operations on the cache map are read-only, meaning sync.RWMutex{} doesn't prevent them from working in parallel. Correct me if I am wrong here.