ShunLu91 / PA-DA

[CVPR '23] PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
MIT License
34 stars 4 forks source link

Questions about probabilistic shape #3

Closed fly2tortoise closed 1 year ago

fly2tortoise commented 1 year ago

Respected researchers I'm sorry to disturb you. I still have a couple of small doubts about PA-DA that I would like to ask you about, thank you very much!

  1. Why is an array of [0.2, 0.2, 0.2, 0.2, 0.2, 0.2] used to control the probability of sampling each edge in B-201 instead of an array like shape(5,6)? Is the probability of each edge the same if it is in DARTS space? Can you tell me the principle?
  2. If it is a shape(7,15) search space, how should I set up the PA sampling rules?
  3. PA-DA reaches Ktau's SOTA, but I still can't run in the 0.713 range, it's roughly around 0.69, and I don't know where I'm not setting the parameters correctly.

Sincerely look forward to your guidance, best wishes to you.

ShunLu91 commented 1 year ago

We appreciate your interest in our paper. Let me address your questions in the following:

  1. When conducting experiments on NB-201, we adopt an array of [0.2, 0.2, 0.2, 0.2, 0.2] with the length of 5 (rather than 6 in your question) to control the path sampling probability. We adhere to the principle that the length of this array corresponds to the number of candidate operations in the search space. Hence, while the probability of each edge remains uniform within the DARTS space, the probability of each candidate operation on each edge is varies and undergoes cntinual updates during training. Indeed, it is more appropriate to maintain an array with shape (5,6) in NB-201, considering that there are 5 candidate operations and 6 edges in each cell. By incorporating a larger array with shape (5, 6, layers), we can assign customized sampling probabilities to each operation on different edges and different layers. However, for the purpose of simplicity, we assumed that candidate operations on each edge share a similar importance. Hence, in our experiments, we only utilized a one-dimensional array. Despite this simplified setting, we were able to achieve satisfactory results. Your question has truly inspired us, and we encourage you to explore the performance by using different array shapes, such as (5,), (5, 6), and (5, 6, layers).

  2. As mentioned earlier, you have the flexibility to experiment with different rules to set up the PA in your own search space. This allows you to customize the sampling probabilities according to your specific requirements and objectives. By exploring various approaches, you can gain insights into the impact of different rules on the performance and efficiency of the search process.

  3. KT is indeed a sensible metric that can be influenced by various factors. In our experiments, we used three different seeds (0, 1, 2) to run the experiments, and obtained average results of 0.713 (0.697, 0.714, 0.725). It is worth noting that the choice of seed can impact the outcome, and using different seeds can provide additional insights into the stability and robustness of the results. I suggest you to explore the effects of different seeds in order to further evaluate the performance and generalizability.

If you have any further questions or need any assistance, please feel free to conduct us.

fly2tortoise commented 1 year ago

Thank you very much and I look forward to learning more from you!