Snowflake-Labs / sfquickstarts

Follow along with our tutorials to get you up and running with the Snowflake Data Cloud.
Apache License 2.0
326 stars 624 forks source link

Possible improvements for Snowpipe Streaming and Dynamic Tables for Real-Time Ingestion (CDC Use Case) #959

Open sfc-gh-nneff opened 10 months ago

sfc-gh-nneff commented 10 months ago

Describe the suggestions A couple of items (minor)

  1. AUTO_SUSPEND on the Virtual Warehouse is 5 seconds. Is there a particular reason for this? I'm not sure the 5 second timeout does anything special and might confuse people new to Snowpipe Streaming. (Warehouse is only used for Dynamic Table updating and querying of the data, to my knowledge).

  2. CLIENT is the name given to the client. Perhaps VHOL_CDC_CLIENT1 might be a better identifier? The demo does not view the billing for Snowpipe Streaming (that's fine), but when viewing the billing, VHOL_CDC_CLIENT1 is nice to see in the output from snowpipe_streaming_client_history instead of CLIENT.

URLs / Resources Needing Modification

*** For CLIENT Name Java code in Zip file

57:    try (SnowflakeStreamingIngestClient client =
58:       SnowflakeStreamingIngestClientFactory.builder("CLIENT").setProperties(props).build()) {

*** For the Warehouse AUTO_SUSPEND property

sfc-gh-smaser commented 10 months ago
  1. The "CREATE WAREHOUSE" command's AUTO_SUSPEND parameter has a DEFAULT value 10 minutes (600 Seconds). One could leave it at that value if it simplifies things for users, but it also may confuse people why it is left at 10 minutes when the Dynamic Table lag is set to 1 minute in these examples and we are sending a continuous trickle of data so essentially the warehouse would be on continuously. Is there a value that you suggest that would confuse people less in your opinion?
sfc-gh-nneff commented 9 months ago

The minimum billable time period for a virtual warehouse is 60 seconds.
https://docs.snowflake.com/en/user-guide/warehouses-considerations#how-are-credits-charged-for-warehouses

Setting the timeout for a warehouse at 5 seconds doesn't really do any good from a credit-usage perspective, since our dynamic table schedule is to refresh every 1 minute.

It also begs the question of "I wonder if SF has changed the billing policy for virtual warehouses?"

Also, IIRC the same warehouse is used for querying the tables in the demo, and it's odd/distracting to have the warehouse timeout after 5 seconds when querying / playing with the data in the demo (removing any cache).

I suggest 60 seconds (barring any other information).