Azure / azure-kusto-go

Azure Data Explorer (Kusto) SDK for Go
MIT License
58 stars 41 forks source link

Required delay between management and ingest commands #29

Closed rnitzan closed 4 years ago

rnitzan commented 4 years ago

Hi

We are writing a client that ingest data into ADX using this golang library, we created in Azure portal an ADX cluster & database and in the client we run the following queries, first to create table and ingestion setting and then ingest data and query the ingest row size.

  1. kusto.Client.Mgmt(..., ".create table [Table Name] (col1: type1, ....)")

  2. kusto.Client.Mgmt(..., ".add table [Table Name] ingestors ('...;...') '...'")

  3. kusto.Client.Mgmt(..., ".create-or-alter table [Table Name] ingestion json mapping [mapping name] '[' ' { "column" : "col1", "datatype" : "type1", "Properties":{"Path":"$.col1"}} ', '...' ']'")

  4. kusto.Client.Mgmt(..., ".alter table [Table Name] policy streamingingestion '{ "NumberOfRowStores": 1 }'")

  5. ingest.New(kusto.Client, [Database Name], [Table Name]).Stream(ctx, data, ingest.JSON, "[mapping name]")

  6. kusto.Client.Query(..., ", [Table Name] | count")

  7. kusto.Client.Mgmt(..., ".drop table [Table Name] ifexists")

When I run all commands in the following sequence one by one, all return successfully (no errors) but the count query returns 0 and in ADX portal as well the new table is empty.

Only if I wait after the management queries (step 1 to 4) for 5 minutes and then do the ingest data, then new table is not empty and count query return number > 0

Why do I need this long delay between table creation and table ingestion and can it be decreased by client/server ?

Also why do the ingest method not return an error in case of no delay, where in reality no rows were inserted into the table.

Thanks Raanan

vladikbr commented 4 years ago

@rnitzan, can you confirm the problem? Are you getting an error? Is it "data does not show up" vs. "data shows up after 5 minutes"?

There is an internal batching mechanism that will by default delay the incoming data by 5 minutes. You can read about how to control it here: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/batchingpolicy and here: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/batching-policy

Please let us know if this is your case.

rnitzan commented 4 years ago

@vladikbr

only if i

  1. create table & ingest setting
  2. wait 5 minutes
  3. ingest data
  4. data shows up and can query it
rnitzan commented 4 years ago

@vladikbr I didn't defined batch policy, here the table details

"TableName": ..., "DatabaseName": ..., "Folder": , "DocString": , "TotalExtents": 0, "TotalExtentSize": 0, "TotalOriginalSize": 0, "TotalRowCount": 0, "HotExtents": 0, "HotExtentSize": 0, "HotOriginalSize": 0, "HotRowCount": 0, "AuthorizedPrincipals": [ ... ], "RetentionPolicy": { "SoftDeletePeriod": "7.00:00:00", "Recoverability": "Enabled" }, "CachingPolicy": { "DataHotSpan": "1.00:00:00", "IndexHotSpan": "1.00:00:00", "ColumnOverrides": [] }, "ShardingPolicy": { "MaxRowCount": 750000, "MaxExtentSizeInMb": 1024, "MaxOriginalSizeInMb": 2048, "UseShardEngine": false }, "MergePolicy": { "RowCountUpperBoundForMerge": 0, "MaxExtentsToMerge": 100, "LoopPeriod": "01:00:00", "MaxRangeInHours": 8, "AllowRebuild": true, "AllowMerge": true }, "StreamingIngestionPolicy": { "IsEnabled": true, "HintAllocatedRate": null, "NumberOfRowStores": 1, "SealIntervalLimit": null, "SealThresholdBytes": null }, "IngestionBatchingPolicy": null, "MinExtentsCreationTime": , "MaxExtentsCreationTime": , "RowOrderPolicy": null,

rnitzan commented 4 years ago

@vladikbr

Another thing noticed, I simulate case of creating 5 tables and their ingestion setting (without errors) and then right away doing ingest (without the 5 minute delay) Now I get error on table not exist on the ingest

Seem it take some time for table metadata to be available in server side.

Insert error: Op(OpIngestStream): Kind(KHTTPError): streaming ingest issue(400 Bad Request): { "error": { "code": "BadRequest_EntityNotFound", "message": "Request is invalid and cannot be executed.", "@type": "Kusto.Data.Exceptions.EntityNotFoundException", "@message": "Entity ID 'TableN_2' of kind 'Table' was not found.", "@context": { "timestamp": "2020-05-19T09:50:03.0822828Z", "serviceAlias": "", "machineName": "KEngine000000", "processName": "Kusto.WinSvc.Svc", "processId": 7532, "threadId": 1272, "appDomainName": "Kusto.WinSvc.Svc.exe", "clientRequestId": "KGC.execute;...", "activityId": "...", "subActivityId": "...", "activityType": "PO.OWIN.CallContext", "parentActivityId": "...1", "activityStack": "(Activity stack: CRID=KGC.execute...)" }, "@permanent": true } }

vladikbr commented 4 years ago

We are investigating the issue where you did not get an error. There is a documented up to 5 minutes delay between authoring tables and streaming ingestion policies and these settings taking effect: https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-streaming#limitations

rnitzan commented 4 years ago

@vladikbr Got a solution. The ingest without 5 minute delay didn't fail because of the previous deleted table ingestion setting. Need to run ".clear database cache streamingingestion schema '' after table drop ,to remove deleted table ingestion setting from ADX cache, so new table (with same name) will not use those settings.