earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
362 stars 55 forks source link

.filter_by_rank() not working (with DwdRoadRequest) #1353

Open SB-511 opened 2 months ago

SB-511 commented 2 months ago

Describe the bug After applying .filter_by_rank() to a DwdRoadRequest it still returns all stations. Sidenote: .filter_by_distance() still works fine.

To Reproduce


import datetime
from zoneinfo import ZoneInfo
from wetterdienst.provider.dwd.road.api import DwdRoadRequest, DwdRoadResolution

LOCATION = (49, 8.4)
NOW = datetime.datetime.now(ZoneInfo("UTC")).replace(tzinfo=None)

# Check the available parameters -> don't miss new ones!
road_parameters_dict = DwdRoadRequest.discover(
    resolution=DwdRoadResolution.MINUTE_10
)
road_params = list(road_parameters_dict["minute_10"].keys())

request = DwdRoadRequest(
    parameter=road_params,
    start_date=NOW - datetime.timedelta(minutes=60),
    end_date=NOW,
)

stations = request.filter_by_rank(latlon=LOCATION, rank=5)

print(stations.df)

Output:

shape: (1_653, 15)
[...]

If you replace .filter_by_rank(LOCATION, 5) with .filter_by_distance(LOCATION, 20) it works fine:

shape: (4, 15)
[...]

Expected behavior Work as described = return the n closest stations.

Screenshots

Desktop (please complete the following information):

Additional context I'm not sure, but I thought it was working some releases ago.

SB-511 commented 2 months ago

(Short test with the DwdObservationRequest shows same problem)

gutzbenj commented 2 months ago

This may be a bit confusing but

stations = request.filter_by_rank(latlon=LOCATION, rank=5)

result = stations.values.all()

print(result.df_stations)

would give you what you need.

We'll probably need to make this more clear but when going through those distance sorted stations we don't really know if any of those has the requested values so what it does is consume K of N stations until RANK stations with values were found. It then stops and df_stations is a view on the stations df based on the consumed stations WITH values.

SB-511 commented 1 month ago

Hey @gutzbenj , thank you for the clarification!

Is there a way to access the actual values of these 5 closest stations?

gutzbenj commented 1 month ago

So you'd either have to set ts_skip_empty to false or lower the ts_skip_threshold to something more pessimistic like 0.75. See https://wetterdienst.readthedocs.io/en/latest/usage/settings.html#settings for reference.

SB-511 commented 1 month ago

Oh okay, so there is no way of accessing the values of these stations directly as .filter_by_rank() is searching for the closest stations, not the closest stations with data?

gutzbenj commented 1 month ago

It is doing exactly that - looking for stations with data in accordance to the settings. But if you set ts_skip_empty=False it would just hand you the X closest stations' data.