Closed amotl closed 3 years ago
@wetterfrosch suggested:
From doing my own work on ingesting DWD data to InfluxDB the other day, I remember that it doesn't like to see more than 20k lines of lineprotocol at once, at least that's what the documentation said last year.
Back then, I saved all lines into one file and splitted them into chunks of 20k lines each, using the awesome Unix command
split
. Then, I submitted them to InfluxDB consecutively.
I can confirm the command outlined above yields more than 20k data points.
wetterdienst dwd readings \
--parameter=air_temperature --resolution=hourly --period=recent \
--latitude=52.5 --longitude=13.4 --distance=200 --tidy | jq length
1742052
Even when not using --tidy
, the number of data points is still 871026
and also yields the Request Entity Too Large
error when trying to export them to InfluxDB. Trying this needs the fix coming from #237.
I can't confirm that the limit is based on the number of data points, at least the limit is not 20k. When truncating the Pandas DataFrame using df = df[:270000]
, the write operation still succeeds.
When going beyond that by truncating to 272k data points using df = df[:272000]
, the write operation croaks again yielding the Request Entity Too Large
error. So, maybe this is actually based on some size limit for the HTTP request body?
Fortunately, the DataFrameClient
's write_points()
method offers a batch_size
parameter [1]. When configuring this to be like batch_size=20000
, the whole operation of writing 1.7 million data points succeeds within ~50 seconds. Using batch_size=100000
takes roughly the same amount of time.
Describe the bug When trying to export some more data to InfluxDB, Wetterdienst croaks. Thanks for reporting this, @wetterfrosch!
To reproduce
Full traceback