google-research / federated

A collection of Google research projects related to Federated Learning and Federated Analytics.
Apache License 2.0
693 stars 196 forks source link

[distributed-dp] DSkellam goes weird w/ small num_bits #66

Closed SamuelGong closed 2 years ago

SamuelGong commented 2 years ago

To whom it may concern,

When using the following interface to calculate the local_stddev for DSkellam,

https://github.com/google-research/federated/blob/76aa6f665248b8a2e5b07cc52a68b4c4b94fd06d/distributed_dp/accounting_utils.py#L570

I obtained different (scale, local_stddev) pairs when I tried different configurations. So what I felt strange was that when I swept the bits from 20 to 16 when keeping other stuff, all the same, I saw the results

  1. First change smoothly when bits $>= 17$;
  2. All of a sudden, the scale and local_stddev both change drastically, implying that they were no longer useful for actual training (e.g., the local_stddev has mounted to server thousands).

For your convenience, I also made a table for the result:

bits 20 19 18 17 16
local_stddev 5.23 5.29 5.52 6.88 162582.83

Please refer to the appended execution results for more details. It seems that somewhere in the optimization process broke if we are using some bits that is too small. My questions:

  1. Did anyone ever encounter a similar issue to mine?
  2. Does that indicate a limitation for DSkellam, e.g., not necessarily fit for low-bit quantization? (But I think 16bits should not belong to a typical "low-bit" scenario.)

Thanks for the attention and really looking forward to your help!


Detailed execution results.

>>> skellam_params(epsilon=6,
... l2_clip=3,
... bits=20,
... num_clients=16,
... dim=16777216,
... q=1.0,
... steps=150,
... beta=np.exp(-0.5),
... delta=0.01)
(8348.494662007102, 5.233357786360573)

>>> skellam_params(epsilon=6,
... l2_clip=3,
... bits=19,
... num_clients=16,
... dim=16777216,
... q=1.0,
... steps=150,
... beta=np.exp(-0.5),
... delta=0.01)
(4132.065287771359, 5.286782384491639)

>>> skellam_params(epsilon=6,
... l2_clip=3,
... bits=18,
... num_clients=16,
... dim=16777216,
... q=1.0,
... steps=150,
... beta=np.exp(-0.5),
... delta=0.01)
(1979.514945816654, 5.517849242526566)

>>> skellam_params(epsilon=6,
... l2_clip=3,
... bits=17,
... num_clients=16,
... dim=16777216,
... q=1.0,
... steps=150,
... beta=np.exp(-0.5),
... delta=0.01)
(793.6147645649165, 6.881591755490655)

>>> skellam_params(epsilon=6,
... l2_clip=3,
... bits=16,
... num_clients=16,
... dim=16777216,
... q=1.0,
... steps=150,
... beta=np.exp(-0.5),
... delta=0.01)
(0.02190604694235002, 162582.82632490725)
kairouzp commented 2 years ago

Hi Zhifeng Jiang,

Thanks for bringing this to our attention. This is weird. The numbers up until bits = 17 make sense to us. Let us try to reproduce the error you are seeing and figure out what is happening. My guess is that some of the scipy optimizers we use under the hood are failing to converge -- notice how the scale (which should be >> 1) is much smaller than 1. We will try to look into it soon and get back to you. In the meantime, feel free to play with the range of the values for the scipy optimizers (to see if it becomes more stable).

Thanks, peter

On Mon, Oct 10, 2022 at 10:13 AM Zhifeng Jiang @.***> wrote:

To whom it may concern,

When using the following interface to calculate the local_stddev for DSkellam,

https://github.com/google-research/federated/blob/76aa6f665248b8a2e5b07cc52a68b4c4b94fd06d/distributed_dp/accounting_utils.py#L570

I obtained different (scale, local_stddev) pairs when I tried different configurations. So what I felt strange was that when I swept the bits from 20 to 16 when keeping other stuff, all the same, I saw the results

  1. First change smoothly when bits $>= 17$;
  2. All of a sudden, the scale and local_stddev both change drastically, implying that they were no longer useful for actual training (e.g., the local_stddev has mounted to server thousands).

For your convenience, I also made a table for the result: bits 20 19 18 17 16 local_stddev 5.23 5.29 5.52 6.88 162582.83

Please refer to the appended execution results for more details. It seems that somewhere in the optimization process broke if we are using some bits that is too small. My questions:

  1. Did anyone ever encounter a similar issue to mine?
  2. Does that indicate a limitation for DSkellam, e.g., not necessarily fit for low-bit quantization? (But I think 16bits should not belong to a typical "low-bit" scenario.)

Thanks for the attention and really looking forward to your help!

Detailed execution results.

skellam_params(epsilon=6, ... l2_clip=3, ... bits=20, ... num_clients=16, ... dim=16777216, ... q=1.0, ... steps=150, ... beta=np.exp(-0.5), ... delta=0.01) (8348.494662007102, 5.233357786360573)

skellam_params(epsilon=6, ... l2_clip=3, ... bits=19, ... num_clients=16, ... dim=16777216, ... q=1.0, ... steps=150, ... beta=np.exp(-0.5), ... delta=0.01) (4132.065287771359, 5.286782384491639)

skellam_params(epsilon=6, ... l2_clip=3, ... bits=18, ... num_clients=16, ... dim=16777216, ... q=1.0, ... steps=150, ... beta=np.exp(-0.5), ... delta=0.01) (1979.514945816654, 5.517849242526566)

skellam_params(epsilon=6, ... l2_clip=3, ... bits=17, ... num_clients=16, ... dim=16777216, ... q=1.0, ... steps=150, ... beta=np.exp(-0.5), ... delta=0.01) (793.6147645649165, 6.881591755490655)

skellam_params(epsilon=6, ... l2_clip=3, ... bits=16, ... num_clients=16, ... dim=16777216, ... q=1.0, ... steps=150, ... beta=np.exp(-0.5), ... delta=0.01) (0.02190604694235002, 162582.82632490725)

— Reply to this email directly, view it on GitHub https://github.com/google-research/federated/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7LCU2AVGTCZRAYYQNYG7DWCRFDXANCNFSM6AAAAAARBQ7WIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kenziyuliu commented 2 years ago

Hi @SamuelGong,

Thanks for your interest in our work! Just adding to @kairouzp above, a short answer is basically that our heuristic to balance quantisation error & modulo clipping error (what skellam_params is doing) determines that the aggregate at the server won’t fit into the bitwidth (16) for the last set of configs, so it is giving you this boundary behaviour with a very small scale and a meaningless local_stddev. Since skellam_params simply tries to find a scale automatically, you can also manually specify a scale for bits=16 for SecAgg and get a reasonable local_stddev like so:

skellam_local_stddev(epsilon=6,
                     scale=350,  # `scale`
                     l2_clip=3,
                     num_clients=16,
                     beta=np.exp(-0.5),
                     dim=16777216,
                     q=1,
                     steps=150,
                     delta=0.01,
                     orders=RDP_ORDERS)
11.43585395098052  # `local_stddev`


(compare the above to your bits=17 output)

More specifically, the heuristic skellam_params works as follows: for a given bitwidth (say bits=16), it tries to select scale and local_stddev such that we can fit roughly k=3 stddevs of the aggregate signal inside the range [0, 2^bits). This “aggregate signal” contains the sum of (1) the scaled client updates and (2) their locally added (scaled) noise. This means that the k=3 stddevs of the aggregate might not fit if you choose a setting with, say, a large clipping bound or strong privacy (e.g. no client subsampling). We also approximate the stddev of this aggregate in a loose way (see “Shared scale” line of Algorithm 1 of the paper) due to the loose norm bound on (1) that gets worse with larger dim, meaning that it could be overestimating the bitwidth that the aggregate needs. To “make more room” for the aggregate (thus allowing a larger scale and a useful local_stddev), you can manually adjust the quantization/modular clipping tradeoff by using a smaller k like so:

skellam_params(epsilon=6,
               l2_clip=3,
               bits=16,
               num_clients=16,
               dim=16777216,
               q=1.0,
               steps=150,
               beta=np.exp(-0.5),
               k=2,                           # note here
               delta=0.01)
(387.69384343074285, 10.56503703173974)

Choosing a smaller l2_clip, dim, steps, q should also work as they all determine the “size” of the aggregate (either in terms of component (1) or (2) above).

If having small bitwidths is important, check out an extension to this work (https://arxiv.org/pdf/2203.03761.pdf) which considers using sparse linear projections.

I hope this helps!

SamuelGong commented 2 years ago

Thank @kairouzp and @kenziyuliu for the informative replies which tell me that:

(1) The issue happens as the variance of the aggregated signal cannot be bounded in the field size of SecAgg, and thus (2) Using a reasonably larger num_bits can fix the problem by enlarging SecAgg's field size; (3) Alternatively, the problem can also be walked around by using a smaller k/l2_clip/dim/steps/q so that the variance of the aggregated signal can be reduced. (4) Moreover, if I want to keep all these things still, then I can resort to manually setting the scale and use the function skellam_local_stddev instead of skellam_params which fails to obtain a meaningful scale.

Appreciate that!