Snowflake-Labs / Excelerator

This is an Excel Addin for Windows that reads and writes data to Snowflake
Apache License 2.0
78 stars 22 forks source link

One option missing in Excelerator For Snowflake #13

Open z05794 opened 3 years ago

z05794 commented 3 years ago

Please advise why I don't have the last option -- "Auto-generate data types when not specified for new columns or a new table " Stored Procedures needed". Thank you!

ssegal100 commented 3 years ago

@z05794 Due to some complications we removed this feature. Is it something that would be important to you? The other way of loading new columns would be to explicitly set the datatype. I'd like to get a understanding of the value it would add to help determine if we should work at enabling it. Thanks! Steve

TonyGaul commented 3 years ago

Hi, Is it complex to add this functionality back in? For me it is a nice to have, which saves me profiling the dataset. I have a some large files and it is not obvious what the ideal datatype is without profiling... which I might use Power Query to detect types. Are you able to suggest data types and populate the first row drop downs? Thanks.

ssegal100 commented 3 years ago

@TonyGaul Sorry for delay. Yes, it's quite complex because the entire upload process is different when we use that feature. I like your suggestion of profiling within Excel. Let me look into it and I'll let you know. Thanks.

TonyGaul commented 3 years ago

@ssegal100 Thanks. Even if it is something as simple as macro code showing basic stats for each column on a new blank row e.g. MIN value, MAX value, MAX length, MAX decimal places. I can then choose an appropriate data type. I had a dataset with 100k+ rows and thought a couple of fields were whole numbers, but after the import failed, I discovered a couple of random text values in the columns. I think basic stats would give me more confidence in setting the data types.

TonyGaul commented 3 years ago

@ssegal100 I did give this some thought and attempted something which leverages Excel functions and your VBA. Just an idea anyway and probably should be beautified, maybe have its own button, perhaps less restrictive data types added (like Text instead of Varchar(n) ) & tested.

So I've modified 2 sub routines and added 1 function. The data type logic & profiling will run when the Define Data Types button is pressed: Sub AddDataTypeDropDowns() Sub uploadData() Function CellType()


Sub AddDataTypeDropDowns() Dim rRange As range Dim t: t = Null ' Application.ScreenUpdating = False Set dataWorksheet = getDataWorksheet() ' need to activate this because this Cells(giStartingRowForUpload, 1), will get the value of the active cell dataWorksheet.Activate If dataWorksheet.Cells(2, 1).value = "Column profiling start." Then dataWorksheet.Rows(1).Resize(7).Delete End If Set UsedRng = dataWorksheet.UsedRange LastColIndex = UsedRng.columns.Count 'UsedRng.Rows.Count ' If there isn't data then bail If LastColIndex > 0 And Cells(1, 1) <> "" Then 'Check if the first cell has a dropdown already. If it does than it means that we should update not insert the row On Error Resume Next t = dataWorksheet.Cells(giStartingRowForUpload, 1).Validation.Type On Error GoTo 0 'Before we start, clear all filters On Error Resume Next dataWorksheet.ShowAllData On Error GoTo 0 If Not IsNull(t) Then 'There is no dropdown so Insert 'dataWorksheet.Rows(giStartingRowForUpload).Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove dataWorksheet.Rows(1).Delete End If ' add profiling rows dataWorksheet.Rows(1).Resize(7).Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove ' Set range row numbers FirstRowIndex = UsedRng.row + 1 ' First row of data skip header LastRowIndex = UsedRng.row + UsedRng.Rows.Count - 1 ' Last row of data ' Add the formula arrays dataWorksheet.Cells(3, 1).FormulaArray = "=INDEX(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",MATCH(MAX(COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ","">""& A$" & FirstRowIndex & ":A$" & LastRowIndex & ")),COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ","">""& A$" & FirstRowIndex & ":A$" & LastRowIndex & "),0))" 'MIN Value dataWorksheet.Cells(4, 1).FormulaArray = "=INDEX(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",MATCH(MAX(COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",""<""& A$" & FirstRowIndex & ":A$" & LastRowIndex & ")),COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",""<""& A$" & FirstRowIndex & ":A$" & LastRowIndex & "),0))" 'MAX Value dataWorksheet.Cells(5, 1).FormulaArray = "=MAX(LEN(A$" & FirstRowIndex & ":A$" & LastRowIndex & "))" 'MAX Length dataWorksheet.Cells(6, 1).FormulaArray = "=MAX(LEN(A$" & FirstRowIndex & ":A$" & LastRowIndex & ")-FIND(""."",A$" & FirstRowIndex & ":A$" & LastRowIndex & "&"".""),0)" 'MAX Decimals dataWorksheet.range(Cells(3, 1), Cells(6, LastColIndex)).FillRight ' Copy formula across for all columns ' Label the rows With dataWorksheet.Cells(2, 1) .Formula = "Column profiling start." .Font.Color = vbRed End With With dataWorksheet.Cells(3, LastColIndex + 1) .Formula = "MIN Value" .Font.Color = vbRed End With With dataWorksheet.Cells(4, LastColIndex + 1) .Formula = "MAX Value" .Font.Color = vbRed End With With dataWorksheet.Cells(5, LastColIndex + 1) .Formula = "MAX Length" .Font.Color = vbRed End With With dataWorksheet.Cells(6, LastColIndex + 1) .Formula = "MAX Decimals" .Font.Color = vbRed End With With dataWorksheet.Cells(7, 1) .Formula = "Column profiling end." .Font.Color = vbRed End With ' Set the number format of the MIN/MAX values, otherwise date values are formatted as serial numbers For i = 1 To LastColIndex dataWorksheet.range(Cells(3, i), Cells(4, i)).NumberFormat = dataWorksheet.range(Cells(FirstRowIndex, i), Cells(FirstRowIndex, i)).NumberFormat dataWorksheet.range(Cells(5, i), Cells(6, i)).NumberFormat = "General" sCellType = CellType(dataWorksheet.range(Cells(3, i), Cells(4, i))) If sCellType = "Text" Then If Cells(5, i).value < 255 Then dataWorksheet.Cells(1, i).value = "Varchar(255)" Else dataWorksheet.Cells(1, i).value = "Text" End If ElseIf sCellType = "Value" Then If Cells(6, i).value = 0 Then dataWorksheet.Cells(1, i).value = "Integer" Else dataWorksheet.Cells(1, i).value = "Double" ' "Number(" & Cells(5, i).value & "," & Cells(6, i).value & ")" End If ElseIf sCellType = "Date" Then If Cells(5, i).value <= 10 Then dataWorksheet.Cells(1, i).value = "Date" Else dataWorksheet.Cells(1, i).value = "Timestamp" End If ElseIf sCellType = "Blank" Or sCellType = "Error" Then dataWorksheet.Cells(1, i).value = "Varchar" Else dataWorksheet.Cells(1, i).value = sCellType End If Next i

    Set rRange = dataWorksheet.range(dataWorksheet.Cells(giStartingRowForUpload, 1), dataWorksheet.Cells(giStartingRowForUpload, LastColIndex))

    With rRange.Validation
        .Delete
        .Add Type:=xlValidateList, AlertStyle:=xlValidAlertStop, Operator:= _
  xlBetween, Formula1:=sgDatatypes
        .IgnoreBlank = True
        .InCellDropdown = True
        .InputTitle = ""
        .ErrorTitle = ""
        .InputMessage = ""
        .ErrorMessage = ""
        .ShowInput = True
        .ShowError = False
    End With
End If

Application.ScreenUpdating = True

End Sub

Function CellType(Rng) Application.Volatile Set Rng = Rng.range("A1") Select Case True Case IsEmpty(Rng) CellType = "Blank" Case WorksheetFunction.IsText(Rng) CellType = "Text" Case WorksheetFunction.IsLogical(Rng) CellType = "Boolean" Case WorksheetFunction.IsErr(Rng) CellType = "Error" Case IsDate(Rng) CellType = "Date" Case InStr(1, Rng.Text, ":") <> 0 CellType = "Time" Case IsNumeric(Rng) CellType = "Value" End Select End Function


in Sub uploadData I just added a 3 lines in this section to remove the new profiling rows before uploading

'************ Checking for Data Type row ************
Dim firstCellValue As String
firstCellValue = dataWorksheet.Cells(giStartingRowForUpload, 1).value
' Check if first row has the data types.
iHeaderRow = giStartingRowForUpload
Dim arrDatatypes() As String
arrDatatypes = Split(sgDatatypes, ",")
If IsInArray(firstCellValue, arrDatatypes) Or firstCellValue = "" Or InStr(firstCellValue, "(") Then
    iHeaderRow = iHeaderRow + 1
    bDataTypeRowExists = True
' ---------------------------------------------------------------------------------------------------------------------
' Remove data profiling rows 
    If dataWorksheet.Cells(2, 1).value = "Column profiling start." Then
        dataWorksheet.Rows(2).Resize(6).Delete
    End If
' ---------------------------------------------------------------------------------------------------------------------
End If

'************ Checking to make sure the first column name has a value ************
ssegal100 commented 3 years ago

@TonyGaul Thank you! I will dig into it.

TonyGaul commented 3 years ago

Thanks! I have probably overcomplicated things, but am sure you'll find a good solution.

From: Steven Segal @.> Sent: Thursday, 27 May 2021 10:32 PM To: Snowflake-Labs/Excelerator @.> Cc: TonyGaul @.>; Mention @.> Subject: Re: [Snowflake-Labs/Excelerator] One option missing in Excelerator For Snowflake (#13)

@TonyGaulhttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FTonyGaul&data=04%7C01%7C%7C3d280663ca854757ebd308d9210b7997%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637577155433190201%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pjKxDe0fSvxKyAmJxYVXMONUCy1aolrplyt8qklCHsU%3D&reserved=0 Thank you! I will dig into it.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSnowflake-Labs%2FExcelerator%2Fissues%2F13%23issuecomment-849593149&data=04%7C01%7C%7C3d280663ca854757ebd308d9210b7997%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637577155433200156%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DxLBK9iQD0pErl%2BOlhshFppxd6h0p1aw1StFUINk2Hg%3D&reserved=0, or unsubscribehttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADNEDTXQPUCYGEOLXBB3Z2TTPY3VJANCNFSM4WXXLEVA&data=04%7C01%7C%7C3d280663ca854757ebd308d9210b7997%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637577155433200156%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nwvUKV3sl5cTEfMPxI6k%2B%2BUyaxOTxa1P0%2FTFhmATVJQ%3D&reserved=0.