AutodeskAILab / Fusion360GalleryDataset

Data, tools, and documentation of the Fusion 360 Gallery Dataset
Other
421 stars 51 forks source link

Precisions on the JoinABLe implementation #90

Closed leopoldmaillard closed 2 years ago

leopoldmaillard commented 2 years ago

Hello ! Thank you for these cool datasets. I'm trying to reproduce the results of the JoinABle paper, and I was wondering how to deal with the edge vertices which have the is_degenerate flag set to True in a part JSON file.

Indeed, we do not have curve or length information for these vertices, but they however seem to appear sometimes in the graph connectivity matrix that I will be using in the message passing network.

That's the reason why I don't feel they should be removed from the part graph, but I don't know how to represent them in the input tensor.

Thank you again !

karldd commented 2 years ago

Yes the degenerate brep edges are included in the graph. I can post the specifics of how we handle them tomorrow. Essentially they have a degenerate one hot type for the curve type and the other numeric values are zeroed out.

Also, we do plan to release the code some time before the CVPR conference. If reproducibility is important that might be a good route to use the 'official' version. Pile on if this is helpful to have sooner rather than later.

leopoldmaillard commented 2 years ago

Thank you for your fast and clear answer ! I would love to get these details as well as the official implementation as soon as you are ready to release it 😃

Congrats on the CVPR acceptance

karldd commented 2 years ago

Thanks. Here is the relevant code. Note that this includes all the input features, for both faces and edges, and all of the ones we used in our ablation studies and baselines (e.g. B-Grid).

We first have a mapping from the various discrete types listed in the json, to some category numbers.

    # The map of the entity types
    entity_type_map = {
        "PlaneSurfaceType": 0,
        "CylinderSurfaceType": 1,
        "ConeSurfaceType": 2,
        "SphereSurfaceType": 3,
        "TorusSurfaceType": 4,
        "EllipticalCylinderSurfaceType": 5,
        "EllipticalConeSurfaceType": 6,
        "NurbsSurfaceType": 7,
        "Line3DCurveType": 8,
        "Arc3DCurveType": 9,
        "Circle3DCurveType": 10,
        "Ellipse3DCurveType": 11,
        "EllipticalArc3DCurveType": 12,
        "InfiniteLine3DCurveType": 13,
        "NurbsCurve3DCurveType": 14,
        "Degenerate3DCurveType": 15,  # Special case for degenerate edges
    }

    # The map of edge convexity types
    convexity_type_map = {
        "None": 0,
        "Convex": 1,
        "Concave": 2,
        "Smooth": 3,
        "Non-manifold": 4,
        "Degenerate": 5
    }

Then if a given node is degenerate we fill it up like this:


            if "is_degenerate" in node and node["is_degenerate"]:
                # If we have a degenerate edge we give some default values
                node["x"] = torch.zeros((self.grid_size, self.grid_size, self.grid_channels))
                node["entity_types"], _ = self.get_node_entity_type(node)
                node["is_face"] = torch.tensor(0, dtype=torch.long)
                node["area"] = torch.tensor(0, dtype=torch.float)
                node["length"] = torch.tensor(0, dtype=torch.float)
                node["face_reversed"] = torch.tensor(0, dtype=torch.long)
                node["edge_reversed"] = torch.tensor(0, dtype=torch.long)
                node["reversed"] = torch.tensor(0, dtype=torch.long)
                node["convexity"] = self.get_node_convexity(node)
                node["dihedral_angle"] = torch.tensor(0, dtype=torch.long)

Where the call to get the one hot encoding of the categorical input features looks like this:

    def get_node_entity_type(self, node):
        """Get the entity type, either surface or curve type for the node"""
        if "surface_type" in node:
            entity_type_string = node["surface_type"]
        elif "curve_type" in node:
            entity_type_string = node["curve_type"]
        elif "is_degenerate" in node and node["is_degenerate"]:
            entity_type_string = "Degenerate3DCurveType"
        else:
            raise Exception("Unknown node entity type")
        entity_type = self.entity_type_map[entity_type_string]
        entity_type_tensor = torch.tensor(entity_type, dtype=torch.long)
        # Convert entity types to a one hot encoding
        # for all 16 types (8 surface, 7 curve, 1 degenerate)
        num_entity_type_classes = len(self.entity_type_map)
        entity_type_one_hot = F.one_hot(entity_type_tensor, num_classes=num_entity_type_classes)
        return entity_type_one_hot, entity_type

In our code we store the graph with all the input features, and then select which ones to use with arguments passed to the training script. Before the entity_types features are passed to the network, we separate them out into B-Rep faces and edges, that get passed to their respective MLPs.

image

I'll work on getting the code cleaned up and ready to share 👍

leopoldmaillard commented 2 years ago

Thank you for these details on how to handle the input features ! I don't fully understand why you use a 16-dimensional one-hot encoding instead of 2 distinct one-hot for faces and edges since they will be passed to distinct MLPs. Is there a specific reason for that ?

One last clarification that would be useful concerns the graph connectivity and the mapping between the id and index of a node. My understanding is the following :

Am I correct ? Thank you again !

karldd commented 2 years ago

I don't fully understand why you use a 16-dimensional one-hot encoding instead of 2 distinct one-hot for faces and edges since they will be passed to distinct MLPs. Is there a specific reason for that ?

Yes it could be done as 2 distinct one-hot vectors for faces and edges. Currently we just split them down the middle before they go to the distinct MLPs. If we were to rewrite it we would likely split them, but for legacy reasons it is like this.

The index key of the entity_one_equivalents nodes is the index of both faces & edges as they appear in the corresponding part JSON file.

Yes, that is right. The index key in the entity_one data structure references the index of the B-Rep faces and edges in the graph json file and the B-Rep smt file. One thing to note here is the json has a single list for nodes with faces first and then edges. So if entity_one is an edge, you need to offset by the number of faces e.g. nodes[num_faces + index]. You can get the number of faces from properties["face_count"].

The connectivity of a part is defined in the links key of the JSON. However, source & target nodes are this time referred by their id key, which does not necessary reflect the order of appearance in the JSON.

Yes that is correct. The id here is a unique id that comes from Fusion 360. The id only gets used in this part of the code, and you don't have to deal with it manually if you do something like this:

from networkx.readwrite import json_graph
from torch_geometric.utils import from_networkx

with open(json_file, encoding="utf8") as f:
    json_data = json.load(f)
# Do some processing here to get the features you want
# and convert from an undirected to directed graph used by PyG
nxg = json_graph.node_link_graph(json_data)
g = from_networkx(nxg)

I'll make a note to add some documentation on this topic, its a bit tricky.

leopoldmaillard commented 2 years ago

Thank you for these comprehensive details ! I'm looking forward the code release and your CVPR oral presentation, best of luck 👍

leopoldmaillard commented 2 years ago

Hello @karldd, I'm back again with a new batch of questions ! I'm done with the JoinABLe implementations and here are some inquiries on the model / training process.

I didn't achieve to learn anything from the data when using the first term of the criterion (cross-entropy between the edge predictions h_uv and the ground truth edge labels j_uv) when the latter is normalized into a probability distribution as mentioned in the paper (j_uv sum to 1 and softmax is applied on all h_uv values). However, training went well when I use a binary cross-entropy instead (j_uv values are either 0 or 1 and sigmoid is applied on each h_uv values).

The second term (symmetric crossentropy) helps with the training process as described in the paper. However, the same notation j_2D is used in the paper for both the row & col terms but I assumed j_2D is normalized along axis 0 & 1 respectively.

The paper mentions a 79% joint axis prediction accuracy. I was wondering if this performance is at pair of topological entities-level (h_uv values predict presence or absence of a connection with 79% accuracy wrt. j_uv) or at assembly-level (logit with highest value in h_uv corresponds to a connection in j_uv with 79% accuracy), which would be easier.

Other concern I have is that I always achieve better performance on the validation set than on the training set. Does the latter contain some challenging samples while the validation set from your split is easier ?

The number of pairs of topological entities (n x m) can be super high for some joint sets, which causes the CUDA memory to explode since the gradient of (n x m) 3-layers (768 -> 1) MLP (joint axis prediction module) should be stored for one single joint set. I was wondering how you handle this memory issue since I'm currently ignoring samples for which (n x m) value exceeds a threshold.

I would love to get more details on the architecture such as the number & dimensions of MLP layers of the face & edge encoders, dimension & number of head of the GATv2 layers, dimensions of the hidden layers of the joint axis prediction MLP, etc.

Feel free to answer partly to these numerous questions if time is missing, and thank you again for your work and kind help. Have a nice day ! 😄

karldd commented 2 years ago

Hi @leopoldmaillard

I just posted the code for JoinABLe here: https://github.com/AutodeskAILab/JoinABLe This should help answer most of the implementation questions. Please open an issue in that repo if you run into issues with getting things up and running.

The paper mentions a 79% joint axis prediction accuracy

So this accuracy is a top-1 hit. So if the top prediction is one of the b-rep faces or edges in the ground truth labels (including equivalent entities that form the same axis), that is considered a hit for that data sample.

better performance on the validation set than on the training set

Yes that sounds right, because they are not from the same distribution the training set can have some more challenging samples. This is a side effect of the clean up we do to remove samples that may be ambiguous. The mix_test set is from the same distribution as the training set. We publish results for the mix_test distribution in the supplemental material, which should be posted on arxiv next Tuesday.

image

As you can see the performance drops for all baselines, but the relative positions don't change.

was wondering how you handle this memory issue

Yes we handle it the same way. We pass in a max_node_count as an argument and then filter out large graphs during training. During inference, we remove the filter and use the CPU to handle the large graphs.

leopoldmaillard commented 2 years ago

Hello @karldd,

Thank you again for your reactivity and comprehensive answers ! I'll have a closer look on the official implementation and will let you know if I have any inquiries 😃