DAG counts differently from expected

ultmaster commented 4 years ago

First, thanks for this open source repo!

I tried to reproduce your search space 3 before this code is released. I found 445 different topologies with loose ends (allowing input->output, which I will raise another issue), altogether 445*3^5 = 108135 architectures. But your paper and code only reports 234 different topologies (24066 in all != 234*3^5). Digging deeper, I found two reasons for this:

Some architectures didn't use "all 5 intermediate nodes". For example, one of your 234 is:

[[0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 1 0 0]
 [0 0 0 0 1 1 0]
 [0 0 0 0 0 1 1]
 [0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0]]

Node 1 is never used (no input, no output). Of course, your code handles this properly, by ignoring not-used nodes in architecture validation. But I don't think it's consistent with "All 5 intermediate nodes are used in this search space" in paper.

Confusingly, some architectures are never generated. For example this:

[[0 1 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 1 1 1 0]
 [0 0 0 0 1 1 0]
 [0 0 0 0 0 0 1]
 [0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0]]

To visualize, I fill node 1-5 with conv3x3.

Interestingly, generating without loose ends uses a completely different code from with loose ends. The core of this part is:

for parents in parent_combinations_old(adjacency_matrix, node, n_parents=num_parents_left):
    # Make copy of adjacency matrix so that when it returns to this stack
    # it can continue with the unmodified adjacency matrix
    adjacency_matrix_copy = copy.copy(adjacency_matrix)
    for parent in parents:
        adjacency_matrix_copy[parent, node] = 1
        for graph in self._generate_adjacency_matrix(adjacency_matrix=adjacency_matrix_copy, node=parent):
            yield graph

The problem here is: backtracking info is not kept when traversing the second node. In the example above, when visiting node 5, all edges connecting to node 4 is lost. Although some graphs are yielded during traverse of node 4, traverse of node 5 is still based on the graph of processing node 6. Therefore, in a word, this code cannot generate DAG with "branches".

Maybe this is done delibrately because some of the downstream optimizers cannot handle branch structure. But as far as I know, this is not mentioned in either your paper or related literature.

I'll really appreciate it if you are able to share more details.

JulienSiems commented 4 years ago

Thank you for your very detailed issue!

Regarding point 1: I agree with you that the wording in our paper is suboptimal. What we meant to say for search space 3 is: 'All available intermediate nodes (up to 5) are used in this search space'. Let me know if you feel like we could still improve the sentence otherwise we will update the paper accordingly.

Regarding point 2: I am looking into this and will get back to you as soon as possible :)

ultmaster commented 4 years ago

In case a node is never used, what does "no. of parents" of this node mean?

Nevertheless, it's just a literature problem. I'm okay either way.

JulienSiems commented 4 years ago

To have a fixed discretization strategy as in DARTS we select the parents for each intermediate nodes, based e.g. in DARTS on the architectural weights. Therefore, there are no unused nodes, since every node will have parents. However, there are the loose ends as discussed in the paper. Hope this clarifies things. Will get back to you on point 2 of your initial comment.

JulienSiems commented 4 years ago

We are adding the updated version of the search space counting in the new version and are in the process of updating the paper. These are the updated numbers. Thank you again @ultmaster for pointing out the incorrect counting of the graphs w/o loose ends! WhatsApp Image 2020-02-26 at 09 38 32

automl / nasbench-1shot1

DAG counts differently from expected #1