Closed zhengjian2322 closed 3 years ago
Thanks!
We used nasbench2_train.py
to find the econas baselines. We also computed the zero-cost metrics in find_measures
at each epoch to see how the metrics change over time. However, these results were not presented in the paper.
To compute the measures found in the paper, use nasbench2_pred.py
.
Thanks! We used
nasbench2_train.py
to find the econas baselines. We also computed the zero-cost metrics infind_measures
at each epoch to see how the metrics change over time. However, these results were not presented in the paper.To compute the measures found in the paper, use
nasbench2_pred.py
.
Thank you for your answer. I have another question, Why do you use abs when you calculate the Spearman coefficients in nasbench101.
Good observation. You don't need to use abs
. It was just convienient when quickly plotting different metrics on the same plot and comparing them since some are negatively-correlated and others are positively-correlated.
Good observation. You don't need to use
abs
. It was just convienient when quickly plotting different metrics on the same plot and comparing them since some are negatively-correlated and others are positively-correlated.
Thank you for your prompt reply! 1.Are all of the zero-cost proxies(snip, synflow ,jacob_cov) mentioned in the paper the bigger the better? 2.I used the code in this repository to reproduce the zero-cost proxies results of ASR(pytorch model in ASR github), but it did not achieve the results in the paper. Could you tell me how do you did it.
Hi again,
I can look into rerunning the computation of proxies for NAS-Bench-ASR on my side and releasing an additional file for how we did it. It will probably take a few days for me to get to it though.
``> Hi again,
- Yes these three metrics, the bigger the better.
- For NAS-Bench-ASR we computed the metrics in the standard way, following the template provided for NAS-Bench-101 and NAS-Bench-201. Nothing is different. Can you please share more information on how your reporoduction of these results is different? For example, what correlation coefficient do you get?
I can look into rerunning the computation of proxies for NAS-Bench-ASR on my side and releasing an additional file for how we did it. It will probably take a few days for me to get to it though.
Thanks, For the synflow, I only changed the function get_layer_metric_array in the file(p_utils) and get the network from the file(https://github.com/SamsungLabs/nb-asr/blob/main/nasbench_asr/model/torch/model.py) with different arch_desc .I found that the syflow value is much smaller than that of NASbench201 and also has a value for configurations that do not constitute a network(Because the last linear in ASR macro-architecture ).
The modified get_layer_metric_array(add nn.Conv1d) is here:
def get_layer_metric_array(net, metric, mode):
metric_array = []
for layer in net.modules():
if mode == 'channel' and hasattr(layer, 'dont_ch_prune'):
continue
if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv1d):
metric_array.append(metric(layer))
return metric_array
a few changes may be required, let me reopen this issue and we'll look into it
a few changes may be required, let me reopen this issue and we'll look into it
Thank you very much for your attention. I would appreciate it if you could provide your code for ASR task or the zero-cost proxies result for ASR.
Hello, I know Mohamed and Abhinav are working on adding proper support for nb-asr in the code. In a meantime, I've added pickle files containing precomputed metrics to the google drive (https://drive.google.com/drive/folders/1fUBaTd05OHrKIRs-x9Fx8Zsk5QqErks8?usp=sharing). These are the files we used to run NAS experiments, although it's been a while so please let us know if there are any problems. Hope that helps!
Hello, I know Mohamed and Abhinav are working on adding proper support for nb-asr in the code. In a meantime, I've added pickle files containing precomputed metrics to the google drive (https://drive.google.com/drive/folders/1fUBaTd05OHrKIRs-x9Fx8Zsk5QqErks8?usp=sharing). These are the files we used to run NAS experiments, although it's been a while so please let us know if there are any problems. Hope that helps!
Thank you so much, the asr result you shared is very helpful for me.
I have another question again. Why BP outperforms BP+warmup(256) after 200 trained models in the paper(Figure 4 d), what do you think is the reason?
Hi, I haven't checked the results carefully so take my words with a pinch of salt, but from my experience the difference like that is usually not meaningful - it can very well be just a result of averaging, as suggested by the fact that eventually both methods are pretty close and well within each other's IQR (on the other hand the difference between the two methods in the range of 0-100 models seems much more significant to me as the IQRs are much more disjoint). Alternatively, if we assume that the difference is meaningful, a valid hypothesis would probably be that warming up a predictor makes later parts of the predicted ranking worse, while improving the earlier ones. To test it, one could try to train a predictor and get overall ranking correlation with and without zero-cost warmup, and compare. What is more, I would also try to link it to the warmup sample size (we can see that warmup with 512 does significantly better) - since the sample is completely random it is possible that average performance of the warmed up predictor, in the latter parts of the ranking, depends more on the quality of the initial samples. Some possible tests regarding this part would include trying to warm up predictors with some cherry-picked samples (e.g., only bad models, only good models, etc.) and maybe trying to use a similar iterative scheme for warmup as the one we use for accuracy prediction (to maximize a chance of having good models in the warmup pool).
Hi, I haven't checked the results carefully so take my words with a pinch of salt, but from my experience the difference like that is usually not meaningful - it can very well be just a result of averaging, as suggested by the fact that eventually both methods are pretty close and well within each other's IQR (on the other hand the difference between the two methods in the range of 0-100 models seems much more significant to me as the IQRs are much more disjoint). Alternatively, if we assume that the difference is meaningful, a valid hypothesis would probably be that warming up a predictor makes later parts of the predicted ranking worse, while improving the earlier ones. To test it, one could try to train a predictor and get overall ranking correlation with and without zero-cost warmup, and compare. What is more, I would also try to link it to the warmup sample size (we can see that warmup with 512 does significantly better) - since the sample is completely random it is possible that average performance of the warmed up predictor, in the latter parts of the ranking, depends more on the quality of the initial samples. Some possible tests regarding this part would include trying to warm up predictors with some cherry-picked samples (e.g., only bad models, only good models, etc.) and maybe trying to use a similar iterative scheme for warmup as the one we use for accuracy prediction (to maximize a chance of having good models in the warmup pool).
Thank you for your detailed answers. Zero-cost proxies is a very nice work, I will continue to pay attention to it.
closing this issue as immediate issues seem to be solved. It still remains to provide implementations for NAS-Bench-ASR/NLP but these are covered by other issues.
Hello,I really enjoyed your paper! I hava a question with the code in nasbench2_train.py. Why does the fuction find_measures run after train.Will that affect the outcome?