dataplat / dbatools

🚀 SQL Server automation and instance migrations have never been safer, faster or freer
https://dbatools.io
MIT License
2.47k stars 802 forks source link

DbaCsv needs support for GUID target columns #9433

Closed petervandivier closed 1 month ago

petervandivier commented 3 months ago

Verified issue does not already exist?

I have searched and found no existing issue

What error did you receive?

WARNING: [13:39:27][Import-DbaCsv] Failure | Invalid cast from 'System.String' to 'System.Guid'.


No obvious quick fix jumps out at me reading through the SqlBulkCopy docs at this time.

Possibly related: #8409

Steps to Reproduce

It appears that the SqlBulkCopy object as currently defined in Invoke-DbaCsv does not support inserting valid guid strings into a uniqueidentifier target column.

The below repro errors out:

Invoke-DbaQuery `
    -SqlInstance localhost `
    -Database tempdb `
    -Query "create table foo ([Guid] uniqueidentifier);"

New-Guid | Export-Csv ./foo.csv 

Import-DbaCsv `
    -SqlInstance localhost `
    -Database tempdb `
    -Table foo `
    -Path ./foo.csv `
    -SingleColumn

I tried to step through the source code to see if there was something obvious I was missing about this functionality. Extracting just the relevant code for a minimal repro appears to indicate that this may be an underlying issue with SqlBulkCopy

SqlBulkCopy repro initialisation ```powershell $server = Connect-DbaInstance -SqlInstance localhost -Database tempdb $sqlconn = $server.ConnectionContext.SqlConnectionObject if ($sqlconn.State -ne 'Open') {$sqlconn.Open()} # Import-DbaCsv.ps1#L540 # [int]$bulkCopyOptions = ([Microsoft.Data.SqlClient.SqlBulkCopyOptions]::Default) # $bulkcopy = New-Object Microsoft.Data.SqlClient.SqlBulkCopy( # $sqlconn, # $bulkCopyOptions, # $transaction # $null # ) # # New-Object: Cannot find an overload for "SqlBulkCopy" and the argument count: 3 $bulkcopy = New-Object Microsoft.Data.SqlClient.SqlBulkCopy($sqlconn) $bulkcopy.DestinationTableName = 'foo' $bulkcopy.BulkCopyTimeout = 0 $bulkCopy.BatchSize = 100 $bulkCopy.NotifyAfter = 30000 $bulkCopy.EnableStreaming = $true $stream = [System.IO.File]::OpenRead("foo.csv") $textReader = New-Object System.IO.StreamReader( $stream, [System.Text.Encoding]::"UTF8" ) $FirstRowHeader = $true $Delimiter = "," $Quote = '"' $Escape = '"' $Comment = '#' $TrimmingOption = "None" $BufferSize = 4096 $reader = New-Object LumenWorks.Framework.IO.Csv.CsvReader( $textReader, $FirstRowHeader, $Delimiter, $Quote, $Escape, $Comment, [LumenWorks.Framework.IO.Csv.ValueTrimmingOptions]::$TrimmingOption, $BufferSize, $NullValue # $null ) ```

After running the above initialisation, executing the below errors out.

$bulkCopy.WriteToServer($reader)

MethodInvocationException: Exception calling "WriteToServer" with "1" argument(s): "The given value '8a8b0b97-bda2-4d1a-a978-5a8c1caa5cb2' of type String from the data source cannot be converted to type uniqueidentifier for Column 0 [Guid]."

Please confirm that you are running the most recent version of dbatools

v2.1.15 - May 18, 2024

Other details or mentions

Slack channel chat context: https://sqlcommunity.slack.com/archives/C1M2WEASG/p1721846945371649

What PowerShell host was used when producing this error

VS Code (integrated terminal)

PowerShell Host Version

Name                           Value
----                           -----
PSVersion                      7.4.3
PSEdition                      Core
GitCommitId                    7.4.3
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

SQL Server Edition and Build number

Microsoft SQL Server 2022 (RTM-GDR) (KB5040936) - 16.0.1121.4 (X64) 
    Jul  2 2024 00:22:34 
    Copyright (C) 2022 Microsoft Corporation
    Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 22631: ) (Hypervisor)

.NET Framework Version

.NET 8.0.6

andreasjordan commented 3 months ago

If you think it's an issue with SqlBulkCopy - why don't you open an issue against SqlBulkCopy? As you have shown, there is nothing dbatools can do here.

petervandivier commented 3 months ago

I don't believe that I've demonstrated there is nothing to be done. A cursory glance at the SqlBulkCopy source code seems to imply that GUID support exists (it would be a little shocking if it didn't).

While I am unable at this time to parse out where the root cause exists, I'll be happy to open up an issue against either the SqlBulkCopy source or docs pages as and when the time seems ripe to do so.

Even assuming SqlBulkCopy must be modified at source to support GUID target types, I imagine the lack of support demonstrated in this issue might warrant the addition of a unit test on resolution.

andreasjordan commented 3 months ago

My last comment was maybe too short.

I don't see a quick fix. And I don't have time to do deeper investigations. So if you have the time and a solution, I would be happy to include it in the dbatools source code.

But maybe in the future, some other contributor can take the time and come up with a solution.

niphlod commented 3 months ago

if bulkcopy didn't support GUIDs there would be zillions of users ranting. I think the culprit here is that we use an external lib (Lumen) to read the csv, and that doesn't output a guid column, rather a varchar one.

petervandivier commented 3 months ago

Idk, there appears to exist a GUID unit test in the LumenWorks source :thinking:

namespace LumenWorks.Framework.Tests.Unit.IO.Csv
{
    [TestFixture]
    public class ColumnTests
    {
        [Test]
        public void ConvertGuid()
        {
            var expected = Guid.NewGuid();
            var column = new Column { Name = "A", Type = typeof(Guid) };
            var candidate = column.Convert(expected.ToString());
            Assert.IsInstanceOf<Guid>(candidate);
            Assert.That(expected == (Guid)candidate);
        }
niphlod commented 3 months ago

and what tells lumen to use a guid column when reading ? it doesn't do heuristic AFAIK, so we know that lumen supports guid, but IMHO we're not telling via Import-DbaCsv (again, AFAIK, we support playing with column mapping via NAME, nothing via TYPE) that a guid column is needed when reading the csv.

niphlod commented 3 months ago

BTW, definitely related to https://github.com/dataplat/dbatools/issues/8409 as noted in the issue, it's the same exact deal (in that case, we're not telling Lumen to output a bit, in this we're not telling Lumen to output a guid)

niphlod commented 3 months ago

so, ideally, fixable. I'll work on this and report back if there are further limitations

petervandivier commented 3 months ago

we know that lumen supports guid, but IMHO we're not telling via Import-DbaCsv

This condition pops if-and-only-if we're inserting to a pre-existing table. At first thought a schema check would allow mapping of this sort.

Since it's not been reported before it seems like a potentially niche condition (based on only 2 reports in 2 years), perhaps this case is low-occurrence enough that it's prudent to save the schema check for a catch{} block if one of the known errors pops so we avoid the overhead for runs where it doesn't matter

niphlod commented 3 months ago

when the table gets autocreated all columns are nvarchar(max) so it's easy :D

niphlod commented 3 months ago

I did not forget about this, something that seemed like a 50LOC update is turning into a 500LOC one. I need to play a bit with lumen's API (fortunately there are examples to follow) directly in c# to figure out how to make it work on PS.

niphlod commented 1 month ago

@rezanid , @petervandivier , would you give #9479 a look, and maybe try that code as well ?