getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
121 stars 145 forks source link

Entity CSV response from client form list request may not have the correct caching headers #662

Open lognaturel opened 1 month ago

lognaturel commented 1 month ago

Problem description

@seadowg ended up in a state where Enketo preview and a Collect App User were seeing different numbers of entities. He looked at his server logs and noticed that the entity lists were not actually being requested. It turned out he was getting cached lists returned from Cloudflare. Because Enekto preview and Collect App User links access the same resource through different URLs, they were being cached separately.

Steps to reproduce the problem

Set up a Digital Ocean server and add Cloudflare with default settings. @seadowg may add more details to this!

Expected behavior

Cached Entity Lists should only be returned when unchanged. Any change to the upstream resource should bust all caching layers.

Central version shown in version.txt

versions:
42d83f19b30d638aae871243bce2caa0d8c6095d (v2024.1.0-1-g42d83f1)
 3fb0c22b1cbdc3a6004963afcc3847a82c09307d client (v2024.1.0)
 b4754cf52bfa64b1ca841bc9ccb64a38726398e8 server (v2024.1.0)

Other notes (if any)

Form definitions aren't being cached in this way. A possible useful next step would be to compare headers for form XML vs entity list CSVs.

If there's nothing obviously wrong, let's deprioritize until we get more reports of issues in this area.

seadowg commented 1 month ago

To add some more context to the CloudFlare setup: I use CloudFlare as the DNS provider for the domain I have Central running at. Central is running on Digital Ocean Droplet (in the documented Docker compose setup) and then I have an A record in CloudFlare pointing towards the Droplet public IP address. I imagine the source of the problem here is that I used CloudFlare's "Proxied" feature on the DNS record. This means:

When you proxy specific DNS records through Cloudflare - specifically A, AAAA, or CNAME records — DNS queries for these will resolve to Cloudflare Anycast IPs instead of their original DNS target. This means that all requests intended for proxied hostnames will go to Cloudflare first and then be forwarded to your origin server.

This behavior allows Cloudflare to optimize, cache, and protect all requests to your application, as well as protect your origin server from DDoS attacks.