Closed frievb closed 1 year ago
@frievb thanks a lot for reporting the issue and documenting the solution. I added an additional warning to the caching section of the documentation to document this to other users as well, see https://github.com/fluves/pywaterinfo/pull/53
@stijnvanhoey approximately the last month me and my colleagues are having trouble with getting data from Waterinfo via pywaterinfo. Sometimes we succeed in getting data, often not. We cannot see any logic in when we succeed or when we do not. It seems to be a problem related to the cache kept. The problem occurs just after connection with the database is made. F.e. while running following code:
from pywaterinfo import Waterinfo
import logging
loglevel= "DEBUG"
numeric_level = getattr(logging, loglevel.upper(), None)
logging.basicConfig(level=numeric_level, format='%(asctime)s %(name)s %(filename)s %(levelname)s: %(message)s')
logger = logging.getLogger(__name__)
vmm = Waterinfo('vmm')
vmm.clear_cache()
The code keeps running on the line vmm = Waterinfo('vmm') The resulting log:
2023-03-31 09:42:41,605 __main__ from pywaterinfo import Waterinfo.py INFO: 2023-03-31 09:42:41.605363
2023-03-31 09:42:41,605 requests_cache.backends __init__.py DEBUG: Initializing backend: sqlite pywaterinfo_cache.sqlite
2023-03-31 09:42:41,605 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: SerializerPipeline(name=pickle, n_stages=2)
2023-03-31 09:42:41,621 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\_USER_\AppData\Local\Temp\pywaterinfo_cache.sqlite:responses
2023-03-31 09:42:41,629 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: None
2023-03-31 09:42:41,630 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\_USER_\AppData\Local\Temp\pywaterinfo_cache.sqlite:redirects
2023-03-31 09:42:41,637 urllib3.connectionpool connectionpool.py DEBUG: Starting new HTTP connection (1): download.waterinfo.be:80
2023-03-31 09:42:41,719 urllib3.connectionpool connectionpool.py DEBUG: [http://download.waterinfo.be:80](https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdownload.waterinfo.be%2F&data=05%7C01%7C%7Cd0042556002b4402d25108db31e59cd0%7C9e2777ed82374ab992782c144d6f6da3%7C0%7C0%7C638158634490507352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4uKWYCXQyyg2cYd4oBi6ixMIL9Tf9nyKoXHEmlz%2BDtU%3D&reserved=0) "POST /kiwis-auth/token HTTP/1.1" 201 549
2023-03-31 09:42:41,721 root waterinfo.py INFO: Current token expires on 2023-04-01 09:42:41.721972
2023-03-31 09:42:41,723 requests_cache.policy.actions actions.py DEBUG: Cache directives from request headers: CacheDirectives()
2023-03-31 09:42:41,724 requests_cache.policy.actions actions.py DEBUG: Pre-read cache checks: Passed
2023-03-31 09:42:41,731 requests_cache.policy.actions actions.py DEBUG: Post-read cache actions: CacheActions(expire_after=datetime.timedelta(days=7), send_request=True)
2023-03-31 09:42:41,736 urllib3.connectionpool connectionpool.py DEBUG: Starting new HTTPS connection (1): download.waterinfo.be:443
2023-03-31 09:42:41,809 urllib3.connectionpool connectionpool.py DEBUG: [https://download.waterinfo.be:443](https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownload.waterinfo.be%2F&data=05%7C01%7C%7Cd0042556002b4402d25108db31e59cd0%7C9e2777ed82374ab992782c144d6f6da3%7C0%7C0%7C638158634490507352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tG1wwAR0Fo5AgnXVQWbmRAYMln0tobcwcdD220pL4Zs%3D&reserved=0) "GET /tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC HTTP/1.1" 304 0
2023-03-31 09:42:41,810 requests_cache.policy.actions actions.py DEBUG: Pre-write cache checks: disabled status
2023-03-31 09:42:41,810 requests_cache.policy.actions actions.py DEBUG: Response for URL [https://download.waterinfo.be/tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC](https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownload.waterinfo.be%2Ftsmdownload%2FKiWIS%2FKiWIS%3Frequest%3DgetRequestInfo%26service%3Dkisters%26type%3DQueryServices%26format%3Djson%26datasource%3D1%26timezone%3DUTC&data=05%7C01%7C%7Cd0042556002b4402d25108db31e59cd0%7C9e2777ed82374ab992782c144d6f6da3%7C0%7C0%7C638158634490507352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BGUGxrtbJRIUQUUoHZqefnoRDaneWdRRg25Ep6yzQTA%3D&reserved=0) has not been modified
2023-03-31 09:42:41,871 root waterinfo.py INFO: Request [https://download.waterinfo.be/tsmdownload/KiWIS/KiWIS?datasource=1&format=json&request=getRequestInfo&service=kisters&timezone=UTC&type=QueryServices](https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownload.waterinfo.be%2Ftsmdownload%2FKiWIS%2FKiWIS%3Fdatasource%3D1%26format%3Djson%26request%3DgetRequestInfo%26service%3Dkisters%26timezone%3DUTC%26type%3DQueryServices&data=05%7C01%7C%7Cd0042556002b4402d25108db31e59cd0%7C9e2777ed82374ab992782c144d6f6da3%7C0%7C0%7C638158634490507352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jZ2iWeknu6OOWFEke40zbG8RxqCiXK6RDcA8wyE3RyQ%3D&reserved=0) reused from cache.
2023-03-31 09:42:41,873 requests_cache.backends.base base.py INFO: Resetting expiration with: 7 days, 0:00:00
Sometimes, this also occurred while running code just after clearing the cache (with _vmm.clearcache()). So maybe there is something else going on as well?
Thanks for reporting. We can certainly improve the caching logic, so let's try to understand what is happening with the cache and provide a better implementation. A first option could be that with the current fixed-period the cache becomes too large to be manageable. Could you please check:
Once that file is removed, the issue seems to be solved
As the size is not the issue but the file existence itself, maybe some interference with other processes?
Furthermore, could you provide the versions you use for the packages? When using pip/virtualenv you can use !pip list
or with conda !conda list
. Of main interest is pywaterinfo itself and requests
, requests-cache
.
With versions:
Package Version
------------------ ---------
...
pywaterinfo 0.6.0
requests 2.28.2
requests-cache 1.0.1
...
urllib3 1.26.15
my output is a bit different using your example code:
➜ python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pywaterinfo import Waterinfo
>>> loglevel= "DEBUG"
>>> numeric_level = getattr(logging, loglevel.upper(), None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'logging' is not defined
>>> import logging
>>> numeric_level = getattr(logging, loglevel.upper(), None)
>>> logging.basicConfig(level=numeric_level, format='%(asctime)s %(name)s %(filename)s %(levelname)s: %(message)s')
>>> vmm = Waterinfo('vmm')
2023-03-31 17:48:44,018 requests_cache.backends __init__.py DEBUG: Initializing backend: sqlite pywaterinfo_cache.sqlite
2023-03-31 17:48:44,018 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: SerializerPipeline(name=pickle, n_stages=2)
2023-03-31 17:48:44,020 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to /tmp/pywaterinfo_cache.sqlite:responses
2023-03-31 17:48:44,040 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: None
2023-03-31 17:48:44,041 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to /tmp/pywaterinfo_cache.sqlite:redirects
2023-03-31 17:48:44,059 requests_cache.policy.actions actions.py DEBUG: Cache directives from request headers: CacheDirectives()
2023-03-31 17:48:44,059 requests_cache.policy.actions actions.py DEBUG: Pre-read cache checks: Passed
2023-03-31 17:48:44,060 requests_cache.policy.actions actions.py DEBUG: Post-read cache actions: CacheActions(expire_after=datetime.timedelta(days=7), send_request=True)
2023-03-31 17:48:44,063 urllib3.connectionpool connectionpool.py DEBUG: Starting new HTTPS connection (1): download.waterinfo.be:443
2023-03-31 17:48:44,190 urllib3.connectionpool connectionpool.py DEBUG: https://download.waterinfo.be:443 "GET /tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC HTTP/1.1" 200 12904
2023-03-31 17:48:44,194 requests_cache.policy.actions actions.py DEBUG: Pre-write cache checks: Passed
2023-03-31 17:48:44,208 root waterinfo.py INFO: Successful waterinfo API request with call https://download.waterinfo.be/tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC
2023-03-31 17:48:44,214 requests_cache.backends.base base.py INFO: Resetting expiration with: 7 days, 0:00:00
2023-03-31 17:48:44,229 requests_cache.backends.base base.py DEBUG: Deleting up to 0 responses
>>> vmm.clear_cache()
2023-03-31 17:48:49,045 requests_cache.backends.base base.py INFO: Clearing all items from the cache
2023-03-31 17:48:49,059 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to /tmp/pywaterinfo_cache.sqlite:responses
2023-03-31 17:48:49,073 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to /tmp/pywaterinfo_cache.sqlite:redirects
The versions of the packages:
Name Version
...
pywaterinfo 0.6.0
pyyaml 6.0
qt 5.12.9
rasterio 1.2.10
requests 2.28.1
requests-cache 1.0.0
...
The output of a successful run looks slightly different than a failed run, but looks still slightly different than your output:
2023-04-03 08:50:04,268 requests_cache.backends __init__.py DEBUG: Initializing backend: sqlite pywaterinfo_cache.sqlite
2023-04-03 08:50:04,269 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: SerializerPipeline(name=pickle, n_stages=2)
2023-04-03 08:50:04,273 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\USER\AppData\Local\Temp\pywaterinfo_cache.sqlite:responses
2023-04-03 08:50:04,318 requests_cache.backends.base base.py DEBUG: Initialized SQLiteDict with serializer: None
2023-04-03 08:50:04,322 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\USER\AppData\Local\Temp\pywaterinfo_cache.sqlite:redirects
2023-04-03 08:50:04,364 urllib3.connectionpool connectionpool.py DEBUG: Starting new HTTP connection (1): download.waterinfo.be:80
2023-04-03 08:50:04,439 urllib3.connectionpool connectionpool.py DEBUG: http://download.waterinfo.be:80 "POST /kiwis-auth/token HTTP/1.1" 201 549
2023-04-03 08:50:04,445 root waterinfo.py INFO: Current token expires on 2023-04-04 08:50:04.445426
2023-04-03 08:50:04,454 requests_cache.policy.actions actions.py DEBUG: Cache directives from request headers: CacheDirectives()
2023-04-03 08:50:04,454 requests_cache.policy.actions actions.py DEBUG: Pre-read cache checks: Passed
2023-04-03 08:50:04,456 requests_cache.policy.actions actions.py DEBUG: Post-read cache actions: CacheActions(expire_after=datetime.timedelta(days=7), send_request=True)
2023-04-03 08:50:04,463 urllib3.connectionpool connectionpool.py DEBUG: Starting new HTTPS connection (1): download.waterinfo.be:443
2023-04-03 08:50:04,644 urllib3.connectionpool connectionpool.py DEBUG: https://download.waterinfo.be:443 "GET /tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC HTTP/1.1" 200 12904
2023-04-03 08:50:04,646 requests_cache.policy.actions actions.py DEBUG: Pre-write cache checks: Passed
2023-04-03 08:50:04,667 root waterinfo.py INFO: Successful waterinfo API request with call https://download.waterinfo.be/tsmdownload/KiWIS/KiWIS?request=getRequestInfo&service=kisters&type=QueryServices&format=json&datasource=1&timezone=UTC
2023-04-03 08:50:04,672 requests_cache.backends.base base.py INFO: Resetting expiration with: 7 days, 0:00:00
2023-04-03 08:50:04,702 requests_cache.backends.base base.py DEBUG: Deleting up to 0 responses
2023-04-03 08:50:04,704 requests_cache.backends.base base.py INFO: Clearing all items from the cache
2023-04-03 08:50:04,717 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\USER\AppData\Local\Temp\pywaterinfo_cache.sqlite:responses
2023-04-03 08:50:04,762 requests_cache.backends.sqlite sqlite.py DEBUG: Opening connection to C:\Users\USER\AppData\Local\Temp\pywaterinfo_cache.sqlite:redirects
I also ran into database locking issues when running simultaneous WaterInfo instances because SQLite does not support concurrent writing. So when two WaterInfo instances try to store a response in the cache, one of the instances will run into a filelock on the SQLite database, this locks seems to not be released properly. So the SQLite cache remains locked and the next time while trying to instantiate a new WaterInfo object, pywaterinfo is unable to connect to the cache in the constructor and keeps retrying but it is never able to obtain the lock. Acces to the SQLite cache is probably triggered at this line.
(back from holiday period)
Although I'm not able to reproduce the specific error (edited but it seems the behavior is also now effecting the tox-based setup when running the unit tests) of @frievb in a controlled setup, the usage of concurrent writing to the sqlite dbase has some limitations. I propose to do the following adjustments:
pip install pywaterinfo[cache]
will include the cache-dependencies.@Beramos and @frievb I'll ping you in a PR I'll make asap.
@frievb and @Beramos I checked this further and the stale behavior issue happens when I use requests-cache > 1.x.x, but not with older versions of requests-cache.
Could you have a check if this works also for you (as this would be a workaround in the meanwhile):
pip install requests-cache=0.9.8
in the environment where pywaterinfo was installedWe"ll make sure it is compatible with the new version as well, but would be good to trace back the issue.
@frievb and @Beramos I checked this further and the stale behavior issue happens when I use requests-cache > 1.x.x, but not with older versions of requests-cache.
Could you have a check if this works also for you (as this would be a workaround in the meanwhile):
- install older version (<1.0.0) of requests-cache, e;g.
pip install requests-cache=0.9.8
in the environment where pywaterinfo was installed- rerun your code and check it the stale behavior is still happening?
We"ll make sure it is compatible with the new version as well, but would be good to trace back the issue.
I can confirm that downgrading request-cache to 0.9.8 resolves my problem with consecutive unit tests (a Waterinfo object is instantiated in each tests) locking the sqlite database.
@Beramos and @frievb see https://github.com/fluves/pywaterinfo/pull/61 for an updated version which should support the latest version of requests-cache as well. Would be good if you could test the version of that PR (clone the repo > switch to SVH-cache-refactor
branch, pip install -e .[cache]
> run code to check behavior) and provide feedback on the PR.
@stijnvanhoey, I've tested this and non-caching works and when installing requests_cachen it seems that the locking issues have been resolved.
@stijnvanhoey, I also tested both the workaround by downgrading requests-cache and the updated version (when caching was not activated). No more problems occurred in both cases. Thanks for the solving the issue!
tackled by #61, closing this issue
Description
I was using pywaterinfo intensely to load a lot of data and was reloading this data multiple times a day to keep track of the last incoming measurements of some measurement stations. Suddenly the initialization time went up from less than a second to 30 - 60 minutes. Likewise, data loading times went up extremely .
Solution
The big amount of data requests probably caused such an increase of data stored in the pywaterinfo cache memory that it slowed down processing speeds. This was easily solved by clearing the cache: