(I've read #3, but I think that this is orthogonal to that idea.)
The explainer currently suggests that browsers fake a download if they already have a language pack that they would ordinarily have to retrieve. I imagine that a simple approach would be to remember how long it took to download a language pack when it was really retrieved and then wait about the same amount of time when a site first asks to use that pack.
This almost works, but there are some fairly obvious side channels.
A download consumes network resources. This means that a site can either inflate the time taken to obtain a pack by adding its own network usage or detect a fake download by observing no change to network throughput when requesting a new pack. Some amount of user fingerprinting is already possible through network capacity observations. This is because the network is a shared resource that we don't protect very well (mostly because it is generally considered to be scarce).
This might give sites the ability to generate a signal that other sites can read through increases to the time that obtaining a pack takes. Multiplied by the number of available packs (hence #3, I suppose).
The only mitigation I can think of here is to have the retrieval of language packs use the network exclusively. That's pretty disruptive though.
(I've read #3, but I think that this is orthogonal to that idea.)
The explainer currently suggests that browsers fake a download if they already have a language pack that they would ordinarily have to retrieve. I imagine that a simple approach would be to remember how long it took to download a language pack when it was really retrieved and then wait about the same amount of time when a site first asks to use that pack.
This almost works, but there are some fairly obvious side channels.
A download consumes network resources. This means that a site can either inflate the time taken to obtain a pack by adding its own network usage or detect a fake download by observing no change to network throughput when requesting a new pack. Some amount of user fingerprinting is already possible through network capacity observations. This is because the network is a shared resource that we don't protect very well (mostly because it is generally considered to be scarce).
This might give sites the ability to generate a signal that other sites can read through increases to the time that obtaining a pack takes. Multiplied by the number of available packs (hence #3, I suppose).
The only mitigation I can think of here is to have the retrieval of language packs use the network exclusively. That's pretty disruptive though.