earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
349 stars 54 forks source link

InfluxDB authentication and InfluxDB 2.x compatibility #456

Closed jscmidt closed 3 years ago

jscmidt commented 3 years ago

Hi all, I was wondering if it is possible to use the data-export with a authentification-enabled influx installation (See here). Your documentation here didn’t mention that scenario. If it isn’t possible yet, would it be a high effort to implement it?

gutzbenj commented 3 years ago

Dear @JSAnyone ,

i will take a look hopefully tomorrow or latest on the weekend and add another argument to to_target() to pass along authentification credentials. We are currently using the DataFrameClient from [1] to acces InfluxDB 1.X. However you were pointing at InfluxDB 2.X documentation. Are you aware of this or are you also using InfluxDB 1.X?

Cheers, Benjamin

[1] https://influxdb-python.readthedocs.io/en/latest/api-documentation.html#dataframeclient

jscmidt commented 3 years ago

Hi,

i will take a look hopefully tomorrow or latest on the weekend and add another argument to to_target() to pass along authentification credentials.

That sounds great! Actually, the link should point to the 1.8 documentation. Maybe you're automatically redirected to 2.x?

amotl commented 3 years ago

Dear @JSAnyone,

thank you for writing in.

@gutzbenj: Indeed, it would be sweet to support all variants of connecting to both InfluxDB 1.x and InfluxDB 2.x. On this matter, I propose those URL specifications for the --target option in the context of InfluxDB, if technically feasible.

For InfluxDB 1.x, the authentication credentials are propagated as username and password while for InfluxDB 2.x, those would be organization and token. When aiming for an SSL connection, you would add a suffix s to the protocol, similar to https.

# Connect to InfluxDB 1.x on localhost, without authentication.
influxdb1://localhost/?database=dwd&table=weather

# Connect to InfluxDB 1.x on remote host, with ssl and authentication.
influxdb1s://username:password@influxdb.example.org/?database=dwd&table=weather

# Connect to InfluxDB 2.x on localhost.
influxdb2://a05test6edtest2d:wpNtestfeNUveYitDLk8Ld47vrSVUTKB_vEaEwWC7qXj_ZqvOwYCRhQTB4EDty3uLFMXWP2C195gtestt4XGFQ==@localhost/?database=dwd&table=weather

# Connect to InfluxDB Cloud (also 2.x).
influxdb2s://a05test6edtest2d:wpNtestfeNUveYitDLk8Ld47vrSVUTKB_vEaEwWC7qXj_ZqvOwYCRhQTB4EDty3uLFMXWP2C195gtestt4XGFQ==@eu-central-1-1.aws.cloud2.influxdata.com/?database=dwd&table=weather

What do you think?

With kind regards, Andreas.

jscmidt commented 3 years ago

Sounds good! I will test it as soon as the PR is merged.

gutzbenj commented 3 years ago

While checking on this, we are currently facing some problems concerning including NULL in the pushed values. @JSAnyone do you have a clue how to write nullable values to influxdb?

jscmidt commented 3 years ago

I think Null-values are/have been simply not allowed in InfluxDB. See here: But maybe this changed with influxdb 2.x

gutzbenj commented 3 years ago

So for tabular data we'd loose a lot of data especially for very old timestamps, where variables such as temperature and precipitation may have been measured but not more modern variables which rely on newer instruments. Would it make sense to fill those values with -999/-999.0 as a replacement?

jscmidt commented 3 years ago

I think filling them with any other value would be really confusing. Why don't you just ignore null values? So that your publish-script checks for every value if it is null, and then only publishes the values that aren't null? Since there's coming a timestamp with every value I think it wouldn't be any problem if not existing values are just missing.

gutzbenj commented 3 years ago

Ok, will try this. So far we were using the DataframeClient which helps for exporting pandas DataFrames, however this leaves no choice else then to convert everything to strings. Instead using the InfluxdbClient and manually writing json should overcome this issue.

The following should be working:

https://gist.github.com/gutzbenj/fcb729f6c74d5d8491b3e83620914082

amotl commented 3 years ago

Hi Benjamin,

as you suggested, I just added 4dca3bb9 to #460.

With kind regards, Andreas.

amotl commented 3 years ago

Hi @JSAnyone,

the implementation of the connection specification as outlined at https://github.com/earthobservations/wetterdienst/issues/456#issuecomment-855081525 has been implemented with 7d1460d1f2f0a612cc43344ba68f99c18cddff15 (InfluxDB 1.x, with optional authentication and SSL support) and 4bf45a9b9bc11805ecad06a31bcd47ad8bdf14c5 (InfluxDB 2.x support).

With kind regdards, Andreas.

jscmidt commented 3 years ago

Hi all, Thanks for implementig this! However, I get an error: influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field \"value\" on measurement \"weather\" is type integer, already exists as type float dropped=4521"} I think that weatherdienst wants to write some values as Intand some as float, but this isn't allowed in influxdb. All values have to be the same type.

jscmidt commented 3 years ago

Edit: The above error appears when trying to store forecast data. I used the following code:

#!/bin/python3
from wetterdienst.provider.dwd.observation import DwdObservationRequest, DwdObservationDataset, DwdObservationPeriod, DwdObservationResolution
from wetterdienst.provider.dwd.forecast import DwdMosmixRequest, DwdMosmixType

print("Requesting forecast Data...")
request = DwdMosmixRequest(
    parameter="large",
    mosmix_type=DwdMosmixType.LARGE
).filter_by_station_id(station_id=['P327', '01001'])
results = next(request.values.query())
results.to_target("influxdb://user:pass@url/?database=telegraf&table=weather")

The full error:

Traceback (most recent call last):
  File "z:/Hobby/Raspberry_Pi/dwdwetter.py", line 12, in <module>
    results.to_target("influxdb://user:pass@url/?database=telegraf&table=weather")
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\wetterdienst\core\scalar\export.py", line 440, in to_target
    batch_size=50000,
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 600, in write_points
    consistency=consistency)
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 685, in _write_points
    protocol=protocol
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 419, in write
    headers=headers
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 378, in request
    raise InfluxDBClientError(err_msg, response.status_code)
influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field \"value\" on measurement \"weather\" is type integer, already exists as type float dropped=4521"}

I also get an error when using observation data. Code:

#!/bin/python3
from wetterdienst.provider.dwd.observation import DwdObservationRequest, DwdObservationDataset, DwdObservationPeriod, DwdObservationResolution
from wetterdienst.provider.dwd.forecast import DwdMosmixRequest, DwdMosmixType

print("Requesting observation Data...")
request = DwdObservationRequest(
    #parameter=[DwdObservationDataset.PRESSURE, DwdObservationDataset.TEMPERATURE_AIR, DwdObservationDataset.WIND, DwdObservationDataset.PRECIPITATION],
    parameter=[DwdObservationDataset.TEMPERATURE_AIR, DwdObservationDataset.WIND, DwdObservationDataset.PRECIPITATION],
    resolution=DwdObservationResolution.MINUTE_10,
    #start_date="2021-05-01",
    #end_date="2021-05-02",
    period=DwdObservationPeriod.NOW,
    tidy=True,
    humanize=True,
    si_units=False,
).filter_by_station_id(station_id=3668)
results = request.values.all()
results.to_target("influxdb://user:pass@url/?database=telegraf&table=weather")

The error I get here:

Requesting observation Data...
Traceback (most recent call last):
  File "z:/Hobby/Raspberry_Pi/dwdwetter.py", line 19, in <module>
    results.to_target("influxdb://user:pass@url/?database=telegraf&table=weather")
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\wetterdienst\core\scalar\export.py", line 440, in to_target
    batch_size=50000,
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 600, in write_points
    consistency=consistency)
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 685, in _write_points
    protocol=protocol
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 419, in write
    headers=headers
  File "C:\Users\Julian\AppData\Roaming\Python\Python37\site-packages\influxdb\client.py", line 378, in request
    raise InfluxDBClientError(err_msg, response.status_code)
influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: unable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623143400000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623144000000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623144600000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623145200000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623145800000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623146400000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623147000000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623147600000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623148200000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623148800000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623149400000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623150000000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623150600000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623151200000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623151800000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623152400000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623153000000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623153600000000000': invalid field format\nunable to parse 'weather,dataset=precipitation,parameter=precipitation_height,station_id=03668 1623154200000000000': invalid field format dropped=0"}

Am I doing something wrong?

amotl commented 3 years ago

Dear Julian,

thank you for providing concise repros reflecting the problem you are observing. We will look into that.

Please note, while @gutzbenj loves the tidy format and it is actually nice because it "annotates" data points with quality information instead of arbitrarily mixing value and quality columns, it is not well suited for InfluxDB data export because of the type anomaly you have been observing:

input field "value" on measurement "weather" is type integer, already exists as type float

So, please turn that option off when running the InfluxDB export from custom code. On top of that, I fear that the forecast domain currently does not feature tabular (aka. non-tidy) data processing at all. Bummer - we will also have to step up here.

With kind regards, Andreas.

jscmidt commented 3 years ago

Hi, Thanks again for the fast answer! With turning off the „tidy“-option the data-export to my influxdb-database works great! Making this also possible for forecast-data of course would be the icing on the cake.

gutzbenj commented 3 years ago

Thanks for your feedback @JSAnyone ! Unfortunately I won't be able to fix the issue until in two weeks because of holiday in the next week.

jscmidt commented 3 years ago

It’s not urgent, enjoy your holiday!

jscmidt commented 3 years ago

Thanks, it's working!