Open bibisbar opened 4 months ago
Thanks for the question. The following is a code snippet for illustration, I hope it helps.
def forward(...): # LLM forward func
...
# Interpret settings, `params` is what CMA-ES optimizes
ss = 0
ee = ss + config.num_hops - 2
layer_idx = np.argwhere(params[ss:ee] > 0).ravel()
layer_idx = layer_idx % config.num_hidden_layers
layer_idx = [0,] + layer_idx.tolist() + [31,]
ss = ee
ee = ss + config.num_hidden_layers**2
scales = params[ss:ee].reshape([config.num_hidden_layers, -1])
scales = np.ones_like(scales) + scales
# Pass data through layers.
prev_layer_ix = -1
for i, layer_ix in enumerate(layer_idx):
if prev_layer_ix < 0:
scale = 1
else:
scale = scales[prev_layer_ix][layer_ix]
layer = self.layers[layer_ix]
# Scale hidden_state and pass it through layer
prev_layer_ix = layer_ix
...
I still feel hard to understand how to design the search space of DFS. Could you show me some explanations or demos? Code would be the best. Thanks!