Azure / azure-sqldb-spark

This project provides a client library that allows Azure SQL DB or SQL Server to act as an input source or output sink for Spark jobs.
MIT License
76 stars 52 forks source link

Question: How to KEEPIDENTITY in bulkinsert? #48

Closed ghost closed 4 years ago

ghost commented 5 years ago

Is there a way to keepidentity from source data while bulk insert? At the moment, using bulk insert SQL server assigns the identity values rather than from the source data but I would like to insert the source data values, is this possible?

My source data identity values are like 1,2,3...150,151,2000

empz commented 5 years ago

@KarthikYan I'm so sorry to hijack your issue, but I really need to know how to let SQL Server assign the identity using this library. I cannot find the answer anywhere and I tried everything I could think of.

If I have a table like:

Id INT IDENTITY(1,1),
Name VARCHAR(50),
CreatedAt DATETIME2

And then I have a dataframe with the following schema:

root
 |-- Name: string (nullable = false),
 |-- CreatedAt: timestamp (nullable = false)

How should I set the BulkCopyMetadata?

val bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "Name", java.sql.Types.VARCHAR, 50, 0)
bulkCopyMetadata.addColumnMetadata(2, "CreatedAt", java.sql.Types.TIMESTAMP, 0, 0)

What does the first parameter refer to? Is it the index on the table or on the dataframe?

I keep getting com.microsoft.sqlserver.jdbc.SQLServerException: Source and destination schemas do not match. no matter what I tried.

ghost commented 5 years ago

@KarthikYan I'm so sorry to hijack your issue, but I really need to know how to let SQL Server assign the identity using this library. I cannot find the answer anywhere and I tried everything I could think of.

If I have a table like:

Id INT IDENTITY(1,1),
Name VARCHAR(50),
CreatedAt DATETIME2

And then I have a dataframe with the following schema:

root
 |-- Name: string (nullable = false),
 |-- CreatedAt: timestamp (nullable = false)

How should I set the BulkCopyMetadata?

val bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "Name", java.sql.Types.VARCHAR, 50, 0)
bulkCopyMetadata.addColumnMetadata(2, "CreatedAt", java.sql.Types.TIMESTAMP, 0, 0)

What does the first parameter refer to? Is it the index on the table or on the dataframe?

I keep getting com.microsoft.sqlserver.jdbc.SQLServerException: Source and destination schemas do not match. no matter what I tried.

did you try creating a data frame that includes ID (Identity column)?

furlong46 commented 5 years ago

What happens when you add an entry to the config with BulkCopyKeepIdentity -> "true"?

david-nguyen commented 5 years ago

Anyone figure this out? Not having the column in the dataframe complains about the schemas, adding bulkCopyKeepIdentity doesn't appear to affect the insert and having the Id column overrides the identity in the table.

david-nguyen commented 4 years ago

Alright got this working:

Dataframe: contains all columns including identity column with value null BulkCopyMetaData: contains all metadata including identity column Config: "bulkCopyKeepIdentity" -> "false"

ghost commented 4 years ago

Used "bulkCopyKeepIdentity" -> "true" to keep identity, it was lot simpler than I originally thought