Closed mdibaiee closed 8 months ago
@nithinkdb, i think the server version needs to be 14.2 or 14.1.x - the next maintenance build. Can you clarify?
@yunbodeng-db how can I check the server version, and how can I change it?
For the time being we have switched to using databricks go sdk, and the FilesAPI interface to upload our files: https://pkg.go.dev/github.com/databricks/databricks-sdk-go
https://pkg.go.dev/github.com/databricks/databricks-sdk-go@v0.24.0/service/files#FilesAPI.Upload
Is your workspace in Azure or AWS? I think it should work on AWS. I am checking in the status of Azure workspace.
Yunbo
On Thu, Nov 2, 2023 at 4:45 AM Mahdi Dibaiee @.***> wrote:
For the time being we have switched to using databricks go sdk, and the FilesAPI interface to upload our files: https://pkg.go.dev/github.com/databricks/databricks-sdk-go
@.***/service/files#FilesAPI.Upload
— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-go/issues/176#issuecomment-1790572483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY7BOD2RL5QANRBYZ7UZX23YCOBUTAVCNFSM6AAAAAA6PHDVU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQGU3TENBYGM . You are receiving this because you were mentioned.Message ID: @.***>
@yunbodeng-db ours is on AWS
13.2+ should support it. You can check out our website.
On Fri, Nov 3, 2023, 6:31 AM Mahdi Dibaiee @.***> wrote:
@yunbodeng-db https://github.com/yunbodeng-db ours is on AWS
— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-go/issues/176#issuecomment-1792441860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY7BOD4OCJXKN2L64ZKNYO3YCTW2XAVCNFSM6AAAAAA6PHDVU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSGQ2DCOBWGA . You are receiving this because you were mentioned.Message ID: @.***>
https://docs.databricks.com/en/_extras/documents/best-practices-ingestion-partner-volumes.pdf I think you need to request to add the workspace to a whitelist.
In a notebook or run the query via the driver "select current_version()". I asked the team, looks like you need to talk to your rep to get your account whitelisted even if you are on AWS.
@yunbodeng-db since we want to run these queries on our customers' workspaces, we don't want to use features that are behind whitelists / need to be manually enabled ideally. So far our usage of the Files API for uploading files is working fine, so we are not going to use the PUT
interface.
@mdibaiee this issue should be resolved with today's GA release of UC Volumes. Can you verify?
@zuckerberg-db Hi, we tried switching to PUT
today and have found that there are PUT
queries intermittently just hang forever, never resolving.
We initially were able to upload some files using PUT
successfully, as seen here:
But on subsequent runs of the same code, we now see:
Note that the query started on 14:20, and the time of the screenshot is 14:28, so the query has been running for 7-8 minutes with no results.
Can you work with your Databricks rep to file a support report? We would be asking workspace info and the complete statement IDs. Unlikely this is a driver issue since the queries on the server were stuck but we can help find out the root cause.
Update on this: The root cause of the hangs was found to be a bug in Databricks file uploads and is planned to be fixed in the next release.
Until then, a workaround we use is providing the OVERWRITE=TRUE
option in the PUT
statement, which bypasses the code that has the bug.
Hello, we are using this driver for writing a connector for Estuary Flow. The method we want to use for landing data in Databricks is staging files on a volume in a catalog, through SQL commands. In this case, we are trying to upload a local file written by the connector using
PUT ‘/path/to/local/file.csv’ INTO ‘/Volumes/catalog/schema/flow_staging/file.csv’
However, this exec command fails with the following two errors:
My understanding is that this error stems from the initial sending of the
PUT
command as-is to databricks. Since the local file is not known to Databricks, I do expect some sort of error here, but I’m not sure if this error is directly explainable by my understanding of it.We also have this other error:
This is the error that stems from the driver attempting to stage the file, and I believe this is the actual culprit of the process. What is puzzling me is how come the driver is not able to fetch metadata about the first request it sent in order to verify whether this request was indeed a file staging request. The relevant code for this error is here: https://github.com/databricks/databricks-sql-go/blob/714e2643455127e45df6e93ec8c8df903e40794f/connection.go#L555-L562 I’m starting to think this might be a bug in either the driver or the interaction between the driver and the server: the driver wants to fetch metadata to see if the previous request was for staging a file, but the server does not recognise the queryId as a valid one and claims it does not exist.
Note that, I have tested this by directly using examples/staging.go and I get the same error, so that file can be considered the minimal reproducible example.
Any help to understand the issue and resolve it is appreciated!