InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

Add expire_after parameter to CachedSession #115

Closed m-appel closed 9 months ago

m-appel commented 9 months ago

Description

The cache of the PeeringDB crawler now expires after six days. Therefore, the weekly dump will always fetch fresh data.

Motivation and Context

Fixes #114 .

Types of changes

Checklist:

m-appel commented 9 months ago

While I'd argue that it's not a magic number, since the assignment to the days= parameter should make its meaning pretty clear, we can put it in the configuration, maybe as a generic option (i.e., independent of the PeeringDB crawlers, since we might use CachedSession in the future for other crawlers.

How about config['cache']['duration_in_days']? We could also add the directory there:

{
    "cache": {
        "directory": "tmp/",
        "duration_in_days": 6
    }
}
romain-fontugne commented 9 months ago

yes, that sounds good to me. So if we increase the frequency of dumps, we can just modify the configuration file and we don't have to touch the code

m-appel commented 9 months ago

We now read the directory and duration from the config. (I think at some point we should also do something about how we handle the config file :D) I've set the default behavior (i.e., when run without config) to "Do not cache".