CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
135 stars 33 forks source link

SARS-CoV-2-Queries API not working #477

Open c-martinez opened 1 week ago

c-martinez commented 1 week ago

https://grlc.io/api-git/egonw/SARS-CoV-2-Queries/subdir/grlc API is not working https://github.com/egonw/SARS-CoV-2-Queries/tree/master/grlc

Reported by @egonw

c-martinez commented 1 week ago

After first inspection, it looks like this repo works fine, it just takes a while to load, so at some point it times out. Increasing the gunicorn timeout "fixes" the issue, but it feels more like masking the issue and not really addressing it.

c-martinez commented 1 day ago

It looks like the bottle neck is fetching the files in the repo. In terms of size, it is not too much, but because https://github.com/egonw/SARS-CoV-2-Queries/tree/master/grlc contains 50+ files, grlc issues 50+ requests to GitHub (via PyGithub) which takes a while, and the request from the browser to grlc.io times out.

One way to optimise this is to cache the GitHub requests (using requests-cache). The first time the https://grlc.io/api-git/egonw/SARS-CoV-2-Queries/subdir/grlc is called, it will still time out. But calling it a second time will load the API properly. We could keep the repo cached for a reasonable amount of time (30 days?) and any subsequent calls in that time period would not time out.

What do you think @egonw? Would that work for you?