apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.41k stars 3.68k forks source link

problem ingesting delimited file to druid #15489

Open tayunyang opened 9 months ago

tayunyang commented 9 months ago

Hi

I like to ingest delimited file to druid. However, I cannot ingest anything because my data is not parsed. I am not sure where is not correct

My tsv file is like

1, 0 2.8736, 8.29 7.10, 8.83

My task spec is below { "type" : "index", "spec" : { "dataSchema" : { "dataSource" : "test_append", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec" : { "dimensions": [{ "name" : "requested", "type" : "long" }, { "name" : "bfee", "type" : "long" } ] } }, "ioConfig" : { "type" : "index", "inputSource" : { "type" : "local", "baseDir" : "/tmp/test", "filter" : "test_append.csv" }, "inputFormat" : { "type" : "tsv", "columns" : [ "requested", "bfee" ] }, "appendToExisting" : true, "dropExisting" : false }, "tuningConfig" : { "type" : "index_parallel", "maxRowsPerSegment" : 5000000, "maxRowsInMemory" : 25000 } } }

Log shows data is not parsed so cannot be published

2023-12-04T21:40:43,048 INFO [[index_test_append_bnilolmo_2023-12-04T21:40:38.315Z]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Preparing to push (stats): processed rows: [0], sinks: [0], fireHydrants (across sinks): [0] 2023-12-04T21:40:43,049 INFO [[index_test_append_bnilolmo_2023-12-04T21:40:38.315Z]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Push complete... 2023-12-04T21:40:43,057 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Nothing to publish, skipping publish step. 2023-12-04T21:40:43,058 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Processed[0] events, unparseable[4], thrownAway[0].

SamWheating commented 9 months ago

Based on the input file name and contents, It looks like you're passing in a .csv file (comma-separated). However, under inputFormat you've specified the type as tsv (tab-separated) which is likely causing parse exceptions.

Also - you can set logParseExceptions: true under tuningConfig in order to include the parsing exceptions in your ingestion logs.

github-actions[bot] commented 15 hours ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.