GoogleCloudPlatform / google-cloud-powershell

PowerShell cmdlets for the Google Cloud Platform
http://googlecloudplatform.github.io/google-cloud-powershell/
Apache License 2.0
135 stars 61 forks source link

Only CSV imports may specify a field delimiter error when attempting to load AVRO files #590

Closed trbates closed 6 years ago

trbates commented 6 years ago

Trying to load Avro files into a BigQuery table with this command:

Start-BqJob -Destination $dest_table -WriteMode "WriteTruncate" -Load -Type "AVRO" -SourceUris $storageUris

Get the following error:

Start-BqJob : Google.Apis.Requests.RequestError
Only CSV imports may specify a field delimiter. [400]
Errors [
    Message[Only CSV imports may specify a field delimiter.] Location[ - ] Reason[invalid] Domain[global]
]
At line:1 char:8
+ $job = Start-BqJob -Destination $dest_table -WriteMode WriteTruncate  ...
+        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Start-BqJob], GoogleApiException
    + FullyQualifiedErrorId : Google.GoogleApiException,Google.PowerShell.BigQuery.StartBqJob
quoctruong commented 6 years ago

@trbates Does it work if you do Start-BqJob -Destination $dest_table -WriteMode "WriteTruncate" -Load -Type "AVRO" -SourceUris $storageUris -FieldDelimiter ""?

trbates commented 6 years ago

@quoctruong it does not. same error message. Looks like it might be coming back from the Google Api that way.

If set FieldDelimiter to null it moves on to the next property.

Start-BqJob -Destination $dest_table -WriteMode WriteTruncate -Load -Type AVRO -SourceUris $s torageUris -FieldDelimiter $null

Start-BqJob : Google.Apis.Requests.RequestError
Only CSV imports may specify a quote character. [400]
Errors [
    Message[Only CSV imports may specify a quote character.] Location[ - ] Reason[invalid] Domain[global]
]
At line:1 char:8
+ $job = Start-BqJob -Destination $dest_table -WriteMode WriteTruncate  ...
+        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Start-BqJob], GoogleApiException
    + FullyQualifiedErrorId : Google.GoogleApiException,Google.PowerShell.BigQuery.StartBqJob

further if I try to set a value for Quote

Start-BqJob -Destination $dest_table -WriteMode WriteTruncate -Load -Type AVRO -SourceUris $s torageUris -FieldDelimiter $null -Quote $null

yields:

Start-BqJob : Google.Apis.Requests.RequestError
Only CSV imports may specify leading rows to skip. [400]
Errors [
    Message[Only CSV imports may specify leading rows to skip.] Location[ - ] Reason[invalid] Domain[global]
]
At line:1 char:8
+ $job = Start-BqJob -Destination $dest_table -WriteMode WriteTruncate  ...
+        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Start-BqJob], GoogleApiException
    + FullyQualifiedErrorId : Google.GoogleApiException,Google.PowerShell.BigQuery.StartBqJob

Looks like this section is always setting the properties.

https://github.com/GoogleCloudPlatform/google-cloud-powershell/blob/74c9972c548c35f2f557f65609abfe5a086c13d7/Google.PowerShell/BigQuery/BqJob.cs#L661-L675

This is usually all I have to set when using the .NET apis directly.

                JobConfiguration cfg = new JobConfiguration()
                {
                    Load = new JobConfigurationLoad()
                    {
                        AllowQuotedNewlines = false,
                        DestinationTable = tableRef,
                        SourceFormat = "AVRO",
                        SourceUris = storageUris,
                        WriteDisposition = "WRITE_TRUNCATE"
                    }
                };
quoctruong commented 6 years ago

Thank you for trying those commands out. So the bug is that we should not set a default values to those properties when the format is AVRO. @trbates, do you think they should still have default values when the format is NOT AVRO?

trbates commented 6 years ago

@quoctruong could the rest api documentation can be used as a guide?

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load

Maybe only pass the optional parameters if exist?

quoctruong commented 6 years ago

Thanks for the input. I’ll fix the bug this week.

quoctruong commented 6 years ago

@trbates the fix is now available in version 1.0.1.4 of the module.