It is good to see that a document on this topic is being prepared, especially as this kind of activity is getting more and more attention. Ethics review is being performed more and more, and academic conferences on internet measurements are requiring it more as well.
We should consider aligning the best practices here and for the conferences.
See: https://conferences.sigcomm.org/imc/2023/cfp/
Furthermore, as a member of a computer science ethics committee we recently described our approach to reviewing ethics requests: https://doi.org/10.14722/ethics.2023.237352
There we take a more general angle at measurements on the internet, and also consider scanning the internet for security vulnerabilities. That angle seems to be completely missing in this draft, even though it is a very closely related activity.
Both these resources provide some background reading that may help in general in improving this draft.
Some more direct comments on the current text (-08):
Section 1.2: there is no definition for “One-/two-ended” ?
section 1.3: Describing ‘Measurement Studies’ as `attacks’ does not make sense to me. I can understand you want to call attention to the user impact of a measurement, but then I would describe that in terms of ‘privacy impact’, ‘information leakage’, etc.
section 2.1.1: The risks described here are conjecture for a hypothetical example. That is not really the best way to accessibly describe this.
As a suggestion:
The experiment can carry substantial risk for the user depending on the their local context. Trying to access censored material can be seen as (network) policy infringement or breaking laws. Even if the experimenter wants to expose volunteers to this kind of risk, they must thus be fully informed, and voluntarily give consent to run the measurement. And even then experimenters should seriously consider designing their experiment in another way.
Section 2.1.3: the A/B example seems very convoluted in its description. It reads like the author had some kind of example in mind, but tried to hard to abstract away from it, or using two examples at the same time.
Section 2.4: There is no conclusion drawn from the fact that it may be possible to infer members of the "do not scan" list.
Section 2.5: replace "it" with "data" in all the section titles.
Section 2.5.1: you mean data that the measurement generated but also the data generated as response from the subjects?
Suggested change: For performance benchmarking, [RFC2544] requires that any frames ..
Section 2.5.2: we published a survey on masking data several years ago, but I don't think things changed that much since then: https://doi.org/10.1145/3182660
Most important consideration from our research: masking can do pseudonimisation (i.e. making it harder to immediately infer identity), but there is almost no masking that can provide anonymization (i.e. making it impossible to infer identity).
Some background comments:
It is good to see that a document on this topic is being prepared, especially as this kind of activity is getting more and more attention. Ethics review is being performed more and more, and academic conferences on internet measurements are requiring it more as well. We should consider aligning the best practices here and for the conferences. See: https://conferences.sigcomm.org/imc/2023/cfp/
Furthermore, as a member of a computer science ethics committee we recently described our approach to reviewing ethics requests: https://doi.org/10.14722/ethics.2023.237352
There we take a more general angle at measurements on the internet, and also consider scanning the internet for security vulnerabilities. That angle seems to be completely missing in this draft, even though it is a very closely related activity.
Both these resources provide some background reading that may help in general in improving this draft.
Some more direct comments on the current text (-08):
Section 1.2: there is no definition for “One-/two-ended” ?
section 1.3: Describing ‘Measurement Studies’ as `attacks’ does not make sense to me. I can understand you want to call attention to the user impact of a measurement, but then I would describe that in terms of ‘privacy impact’, ‘information leakage’, etc.
section 2.1.1: The risks described here are conjecture for a hypothetical example. That is not really the best way to accessibly describe this.
As a suggestion: The experiment can carry substantial risk for the user depending on the their local context. Trying to access censored material can be seen as (network) policy infringement or breaking laws. Even if the experimenter wants to expose volunteers to this kind of risk, they must thus be fully informed, and voluntarily give consent to run the measurement. And even then experimenters should seriously consider designing their experiment in another way.
Section 2.1.3: the A/B example seems very convoluted in its description. It reads like the author had some kind of example in mind, but tried to hard to abstract away from it, or using two examples at the same time.
Section 2.4: There is no conclusion drawn from the fact that it may be possible to infer members of the "do not scan" list.
Section 2.5: replace "it" with "data" in all the section titles.
Section 2.5.1: you mean data that the measurement generated but also the data generated as response from the subjects?
Suggested change: For performance benchmarking, [RFC2544] requires that any frames ..
Section 2.5.2: we published a survey on masking data several years ago, but I don't think things changed that much since then: https://doi.org/10.1145/3182660 Most important consideration from our research: masking can do pseudonimisation (i.e. making it harder to immediately infer identity), but there is almost no masking that can provide anonymization (i.e. making it impossible to infer identity).