Open jeff-99 opened 1 year ago
Hi Jeff, I tried to reproduce the behaviour, but in my case Cloner was able to read the VDS successfully, which included getting the parent table as a dependency:
"vds": [
{
"accessControlList": {},
"entityType": "dataset",
"fields": [...],
"id": "3c070243-5e52-43f4-b448-b656b314872d",
"owner": {...},
"path": [
"Staging",
"TOS",
"Container",
"Container"
],
"sql": "WITH CONTAINER AS ( SELECT * FROM sys.nodes )\nSELECT *\nFROM CONTAINER\nWHERE name = 'node'",
"sqlContext": [
"Staging",
"TOS",
"Container"
],
"type": "VIRTUAL_DATASET"
}
]
Can you be more specific as to what circumstances (e.g. during read/write) this error happened and provide a stack trace?
I would suggest verifying the sqlContext at which the query was drafted and always using the full dataset path in the sql to avoid any confusion. The name of the table (container) if it repeats twice in the path name of the dataset could be resolved to different children at different locations depending on what the sqlContext was when either SQL was written.
anything.container.container and anything.container would both be matched to container.
It would be a good feature for cloner to automatically update the name of each dataset in the SQL to be the fully qualified path and to nullify the sqlContext in all cloned VDS. During a clone this is useful to avoid name conflicts. I think it would need to detect the relative location of each dataset to do this.
In one of our systems we have the following VDS that causes an infinite loop in dependency resolving.
The VDS' name is
Staging.TOS.Container.Container
and the query roughly looks like this:This gave a python recursion depth exception on processing the VDS. Changing the reference to the following solved the issue:
The initial query is a perfectly valid query so should IMO not cause an issue in syncing the script to source control