SamsungLabs / FedorAS

FedorAS: Federated Architecture Search under system heterogeneity
Apache License 2.0
8 stars 4 forks source link

Question about FedorAS framework #1

Open fabriziojpiva opened 1 month ago

fabriziojpiva commented 1 month ago

Hi authors,

Thanks for your contribution, you did a very interesting work on FedNAS. I have read your paper completely and I have a question regarding the sampling strategies of your method and how the clients are optimized.

In the paper, you mention in step 1) that for each client, you sample a subspace without exceeding the budget B_comm. In Fig. 2, I assume that those subspaces are shown in full black line, while the dashed lines are the subspaces that are discarded because they exceed the budget B_comm. But then in step 2), where on-device training with budget-aware path sampling is made, the possible subspaces are ignoring the previous sampling strategy, because they are considering paths that were discarded in 1) (see my black arrows drawn) or discarding paths that were supposed to be explored (see my red arrows). See screenshot below for the full picture.

image

Because of this, do I understand correctly that the sampling strategies 1) and 2) are just different and independent from each other?

Thanks for your time and would love to hear this small clarification.

vaenyr commented 3 weeks ago

In this Figure, the green and the yellow cases are two different clients (in Step 2 note the batch index increases from left to right, not from top to bottom). Different clients receive different subsets and sample only the paths that are part of their respective subsets - this is the intended message.

To verify, please take a look at this piece of code that decides what subspace is being communicate to a client: https://github.com/SamsungLabs/FedorAS/blob/master/src/server.py#L429, and then here: https://github.com/SamsungLabs/FedorAS/blob/master/src/client.py#L73 a client uses it to create a relevant sampler object. The Sampler assigns infinite cost to the ops that were marked as invalid: https://github.com/SamsungLabs/FedorAS/blob/master/src/models/utils.py#L189 (this is just an implementation details, it could have been done more efficiently).

Hope that clarifies your concerns! Apologies if it wasn't clear from reading the paper.

fabriziojpiva commented 3 weeks ago

Hi @vaenyr, thanks for the quick reply. Now it is a bit more clear but I have one more question. If the green and yellow cases are two different clients, that means that each one is being exposed to a different sampling strategy (one considering the common Budget B_comm and the other one considering the budget of the tier). Is there a specific criteria that defines to which sampling strategy a certain client is exposed to? Or is it randomly assigned?

Thanks for the help.

vaenyr commented 3 weeks ago

I'm not sure if I understand. There are two budgets, in general: the comms budget that decides how much data any client can receive/upload (this one is kept the same across all clients in our experiments), and a computational budget that relates to how "powerful" a client is. The latter we model by grouping models into tiers and assigning a single budget to each tier, resulting in coarse-grained system heterogeneity (in general, it could be generalized to having a per-client budget, which would be fine-grained system heterogeneity) - note this assignment is not something a server does as part of our method, but rather a choice we made to model the environment in which the server operates. In particular, the server does not require to know what the computational budget of each client is.

Regarding comms budget: as mentioned, all clients use the same. Server samples a subset that's within the budget.

Regarding comp budget: different clients are assigned to their tiers at the beginning of each experiment, randomly, and this assigned is stored and kept constant for the whole duration of the experiment (including any followup experiments etc.). This is to simulate situation when some data might only be available on low-end devices and some (big) networks might not be able to ever see the data (see Introduction). After a client is assigned to a tier, it will never ever run a path that violates its related budget (except for the purpose of testing/measuring things, obviously).

fabriziojpiva commented 3 weeks ago

Thanks for the reply. I totally understand the two budgets system: the common and the computational. That is very clear. What I don't understand is the following. According to the figure that I placed in my first comment of this issue, these two budgets create two different types of graphs from which the clients can sample:

1) One graph is made considering the common budget: on which valid paths are indicated in full gray line, invalid paths in dashed line, and a sampled example path indicated in green 2) The other one is made considering the computational budget (or per tier budget): on which valid paths are indicated in full gray line, invalid paths in dashed line, and a sampled example path indicated in orange

Considering that your first reply to the issue was "the green and the yellow cases are two different clients", that means that in the figure, the client 1 received the graph with valid paths considering the common budget while the client 2 received the graph with valid paths with respect to the computational budget. These two graphs are different, that is quite clear in the figure, because some valid paths for client 1 are invalid for client 2 and viceversa.

My question is, how does FedorAS determine which type of graph a client receives? In other words, given a client C, will C receive the graph with respect to the common budget B_comm or the computational budget B_tier?

vaenyr commented 3 weeks ago

Clients always receive subspaces that are sampled w.r.t. the communication budget (as indicated by the arrow labelled with "1" in the Figure).

Validity of different edges/paths w.r.t. the computational budget is considered when sampling a path from the communicated subspace, individually by each client. There is no explicit graph of valid/invalid operations created for this part (nor shown in the Figure). One of the reasons behind this is that we make a (soft*) assumption that any communicated operation can be trained by any model - in the worst case scenario, a valid path would include this operation only and the rest of the layers would be identity (cost=0). But because of that, we cannot rule out certain operations ahead of the time when considering computational budget. In principal, any communicated operation can be sampled by any client.

.* I'm calling it a soft assumption because even if we tried to relax it, if a path composed of one operation and filled with identities would not be valid for certain clients, then this operation would effectively NEVER be trained on those clients, meaning further it would have the same effect as if it was not communicated.

vaenyr commented 3 weeks ago

Perhaps the source of confusion is related to the "sampling" of a subspace. There's more than one graph that meets the communication budget. Each client receives a random one, but they all are derived from the same comms budget.

So in the Figure, two clients receive two different subspaces, both meeting the comms budget and sampled randomly by the server (without considering the client's computational budgets).