Closed scherbinek closed 1 year ago
Dear @scherbinek,
thank you for the excellent report. @larsrinn recently reported a similar thing at #704, that the cache control environment variables WD_CACHE_DISABLE
and WD_CACHE_DIR
would not be honored correctly, which have been introduced with version 0.18.0 ^1.
However, when looking for them in the current state of the code base, I can not find either of them. It looks like 9c7cee5940 got lost somehow? Do you have any clue about it, @gutzbenj?
With kind regards, Andreas.
Oh, the code is there, but because the prefix WD_
is handled in a separate line of code, I have not been able to spot it.
Oh, and I also spotted this one. Not sure whether use_listings_cache=True
is "always on" here, even when running with cache disabled?
Edit: I've addressed this with GH-828, but I think this is only a cosmetic issue, and not responsible for any functional flaw.
I've exercised your scenario using the following program, using Wetterdienst 0.50.0, on both macOS and within a Docker container.
#
# Synopsis:
#
# docker run --rm -it python:3.10-bullseye bash
# pip install wetterdienst
# python example-827.py
#
import logging
from wetterdienst import Settings
from wetterdienst.provider.dwd.observation import DwdObservationRequest
logger = logging.getLogger(__name__)
def process():
Settings.cache_disable = True
r1 = DwdObservationRequest(
parameter=['climate_summary'],
resolution='daily',
period='recent'
).all()
print(r1)
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
process()
Using Settings.cache_disable = True
, to turn off caching, works perfectly well for me [^1], I am able to confirm that no directory has been created at either /Users/amo/Library/Caches/wetterdienst
(macOS) or /root/.cache/wetterdienst
(Linux/Docker), after running that program.
Maybe you can share more details about your Docker environment, as being driven by Airflow? Maybe any special parameters or options are be used?
Which versions of Wetterdienst and Docker are you running?
[^1]: so does environ['WD_CACHE_DISABLE'] = 'True'
.
Maybe it was really just an upstream error / fluke?
wetterdienst.exceptions.MetaFileNotFound: No meta file was found amongst the files at https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/recent/.
The airflow log refers to wetterdienst.util.fsspec_monkeypatch - INFO - Dircache located at
/root/.cache/wetterdienst
which doesn't exist as folder (and wasn't solved by creating the wetterdienst folder).
That log message was misleading, it will be fixed with GH-828. Thank you.
Hi @amotl
Thank you for your detailed analysis and description. I tested as well your docker setup including the example-827.py and can confirm a working scenario as well. It even works with my server setup locally... but throws the mentioned error on my server. Testing it locally and on my servers step by step led to the actual error.
The error seems so simple that I curled the website on my server but everything was fine. But i didn't try to curl the requested webiste https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/recent/ in my docker setup on the server.
airflow@653d5258586b:/opt/airflow/dags$ curl https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/recent/ curl: (77) error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs
For using a secure connection I use SSL certificated on my server and mounted them to the docker container as well. Something I commented out locally as it runs on localhost.
volumes:
But as I only mount the servers /usr/share/pki/trust/anchors without any ca-certificates for checking the SSL certificate of other websites, I receive can't requests the website with a default verify on SSL. Thus I only have to mount my servers certificates to another folder than /etc/ssl/certs as it overwrites all ca-certificates of the docker container.
**- /usr/share/pki/trust/anchors: /usr/share/pki/trust/anchors**
And. It works. At least it was a tricky one as I never thought that the url can't be verified and therefore it pops up the initial mentioned error. Additionally it is not the first url I request but also request and use the Genesis API of the Statistischen Bundesamt - without problems.
Hopefully this issue might be a hint for further setups as mine. And sorry for any inconvenience as it feels like self-owned. And I even more don't like handling with SSL and certificates. The issue can be closed except you have any follow-up questions on my issue.
Regards, Marcel
Hi @scherbinek,
thank you for your response, I am happy it works for you now. However, I will reopen this issue, because I would like to investigate if we should include the certifi package as a dependency, and if this would have improved the situation in your case.
With kind regards, Andreas.
I think we can close this.
certifi
is already indirectly in our dependents (probably through fsspec/requests) and the issue can't be resolved by installing certifi but rather by linking it to system installed certificates.
The issue can't be resolved by installing certifi but rather by linking it to system installed certificates.
I was about to agree, but wasn't fully convinced [^1], so I just looked up the topic on the corresponding urllib3
and aiohttp
documentations.
It looks like there is an option to make urllib3
use the certificates from the certifi
package, and it is well documented.
Unless otherwise specified urllib3 will try to load the default system certificate stores. The most reliable cross-platform method is to use the certifi package which provides Mozilla’s root certificate bundle.
Once you have certificates, you can create a PoolManager that verifies certificates when making requests:
>>> import certifi >>> import urllib3 >>> http = urllib3.PoolManager( ... cert_reqs='CERT_REQUIRED', ... ca_certs=certifi.where() ... )
-- https://urllib3.readthedocs.io/en/stable/user-guide.html#certificate-verification
[^1]: I mean, what would be the point of providing the certificates per Python package then, if you can't make Python actually use it?
It looks like aiohttp
does not document how to use certificates from certifi
. Evaluate "aiohttp" with "certifi" has a corresponding example program, its gist is:
import aiohttp
import certifi
import ssl
sslcontext = ssl.create_default_context(cafile=certifi.where())
session = aiohttp.ClientSession()
response = await session.get("https://www.hrw.org/", ssl=sslcontext)
Do you think we should carry that information forward to both the aiospec and the fsspec projects, to improve their documentation and their internals?
Sure! But my honest opinion is: I've only seen this error once on a managed machine at work and there probably if you get this error nothing else works as well.
Usually if you install python (and maybe requests afterwards) everything should work out of the box and if not we wouldn't be able to provide any help and aiohttp neither, but the user would rather have to make sure that certificates on the machine are correctly installed.
Closing this as is not related to anything on our end.
Hey!
I got the same error as described in https://github.com/earthobservations/wetterdienst/issues/678
Describe the bug
To Reproduce Nothing special. Just a simple request which works locally on my computer.
Desktop (please complete the following information):
Additional context The script works perfectly fine on the local computer. But crashes with the above mentioned error on a server instance within a docker container of apache airflow. I already switched off the cache to avoid any issues. But wetterdienst.info() refers to a location at /home/airflow/.cache/wetterdienst which doesn't exist as folder (and wasn't solved by creating the wetterdienst folder). The airflow log refers to wetterdienst.util.fsspec_monkeypatch - INFO - Dircache located at /root/.cache/wetterdienst which doesn't exist as folder (and wasn't solved by creating the wetterdienst folder).
It seems that fsspec tries to resolve a cache directory for parsing the metadate file from the url but receives an empty list of files which led to the error and doesn't even try to request the content of the url. The dircache at /root/.cache/ seems to be misleading as it shouldn't be started as root. So my best guess is some authorization issue in a linux based context based on the fsspec_monkeypatch cache.
I'll give it a further try tomorrow. I try to debug the issue and share my result. But I am thankful for any hints. Initially I tried to search for an environment variable to overwrite the fsspec cache.
Regards, Marcel