Closed nickynicolson closed 5 months ago
You might be better off with the CSV export routine for all datasets: https://techdocs.gbif.org/en/openapi/v1/registry#/Datasets/searchDatasetsExport
It's currently fine:
for o in `seq 0 1000 92199`; do echo -n $o ' ' && time curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null; done
0 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.04s system 2% cpu 6.641 total
1000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 2% cpu 6.375 total
2000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 1% cpu 6.628 total
3000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.03s system 2% cpu 6.660 total
4000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.01s system 1% cpu 6.505 total
5000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 2% cpu 6.414 total
6000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.04s system 0% cpu 13.951 total
7000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.237 total
8000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 2% cpu 5.933 total
9000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.01s system 1% cpu 6.092 total
10000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 2% cpu 6.038 total
11000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.704 total
12000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.05s system 2% cpu 6.826 total
13000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.416 total
14000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 2% cpu 6.207 total
15000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.463 total
16000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 0% cpu 13.268 total
17000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 1% cpu 6.818 total
18000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 1% cpu 6.701 total
19000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.611 total
20000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 6.636 total
21000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.04s system 1% cpu 7.097 total
22000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.01s system 1% cpu 6.517 total
23000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 2% cpu 6.735 total
24000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 6.420 total
25000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.04s system 1% cpu 14.944 total
26000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.04s system 1% cpu 6.783 total
27000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.01s system 1% cpu 6.766 total
28000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.426 total
29000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.626 total
30000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 6.672 total
31000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 6.645 total
32000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 9.194 total
33000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.769 total
34000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 1% cpu 10.996 total
35000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 6.696 total
36000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.588 total
37000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 1% cpu 6.877 total
38000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 2% cpu 6.648 total
39000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.17s user 0.05s system 2% cpu 8.362 total
40000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.00s system 1% cpu 6.384 total
41000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.628 total
42000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 6.457 total
43000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 0% cpu 14.244 total
44000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 7.449 total
45000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 1% cpu 6.220 total
46000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 2% cpu 6.310 total
47000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 7.631 total
48000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 6.442 total
49000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 2% cpu 6.486 total
50000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.544 total
51000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 6.849 total
52000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.15s user 0.03s system 1% cpu 14.946 total
53000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 7.081 total
54000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.04s system 1% cpu 8.237 total
55000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 7.450 total
56000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 1% cpu 6.685 total
57000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 2% cpu 6.606 total
58000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.378 total
59000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.08s user 0.02s system 1% cpu 6.226 total
60000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 6.165 total
61000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.04s system 1% cpu 10.039 total
62000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 7.063 total
63000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.40s user 0.14s system 4% cpu 13.401 total
64000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 1% cpu 6.881 total
65000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.08s user 0.03s system 1% cpu 6.454 total
66000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.02s system 1% cpu 6.452 total
67000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 6.663 total
68000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.446 total
69000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.03s system 0% cpu 14.656 total
70000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.01s system 1% cpu 6.750 total
71000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.409 total
72000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.02s system 1% cpu 6.173 total
73000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 5.992 total
74000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 6.153 total
75000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.01s system 1% cpu 6.474 total
76000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.02s system 1% cpu 6.228 total
77000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 1% cpu 6.116 total
78000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.02s system 0% cpu 13.136 total
79000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 1% cpu 7.507 total
80000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 1% cpu 6.752 total
81000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.01s system 1% cpu 6.215 total
82000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 6.682 total
83000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.26s user 0.09s system 3% cpu 10.529 total
84000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.03s system 1% cpu 6.195 total
85000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.01s system 1% cpu 6.590 total
86000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 6.128 total
87000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.12s user 0.02s system 0% cpu 13.680 total
88000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 5.970 total
89000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.11s user 0.02s system 2% cpu 5.887 total
90000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.09s user 0.02s system 2% cpu 5.550 total
91000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.10s user 0.03s system 2% cpu 6.027 total
92000 curl -Ss 'https://api.gbif.org/v1/dataset?limit=1000&offset='$o > /dev/null 0.08s user 0.01s system 5% cpu 1.796 total
but maybe we should change our monitoring for this query to look at a high offset, rather than the first page.
but maybe we should change our monitoring for this query to look at a high offset, rather than the first page.
Yes, that sounds sensible
Done (in a private repository).
I need to access metadata for all datasets using the registry API, so I have to paginate through subsets of the data at most 1000 records at a time. High offsets seem to lead to degraded performance: