This PR removes to a great extent the constraint inherited from the EMPSN repository where all cells that are the same rank must consist of the same number of nodes.
Changes
Introduced optional script arguments to specify a constant rank for each lifter function. Example: --lifters identity ring_lift:2 functional_group:2 means that these lifter functions will be used to construct a CC, and all cells created by ring_lift or functional_lift will have rank 2. Note that no argument is specified for identity, which will default to cardinality mode, meaning the rank of each cell created by identity_lift will be equal to its cardinality. This can also be explicitly requested by setting identity:c (c for cardinality). If a cell is created by multiple lifter functions, its final rank will be the minimum of all ranks assigned to it by each of its lifters.
Implemented a custom Collater object which allows us to handle issues arising from lifting the cardinality constraint. Namely, we now pad all tensors whose names match either f'x_{i}' or f'inv_{i}_{j}' with torch.nan. These tensors hold the indices of nodes underlying cells or relationships between cells, so torch.nan is used to denote the absence of some nodes, allowing us to represent a cell with i nodes and another cell with j nodes, $i \neq j$ in the same tensor. This decision forced me to refactor places in the code where these tensors are used to index into other tensors, since the index tensors are now torch.FloatTensor and contain nan values.
Renaming Simplicial ➡️ Combinatorial:
Notes
Because pytorch_geometric does not support (and even explicitly prevent) passing custom collate functions to its dataloader class, we are no longer using torch_geometric.loader.DataLoader. Instead, we are using the familiar torch.utils.data.DataLoader. I subclassed torch_geometric.loader.dataloader.Collater in such a way that retains all of its original functionality needed for torch_geometric to perform batching properly but allows us to do perform pre- and post-collate operations.
Overview
This PR removes to a great extent the constraint inherited from the EMPSN repository where all cells that are the same rank must consist of the same number of nodes.
Changes
--lifters identity ring_lift:2 functional_group:2
means that these lifter functions will be used to construct a CC, and all cells created byring_lift
orfunctional_lift
will have rank 2. Note that no argument is specified foridentity
, which will default to cardinality mode, meaning the rank of each cell created byidentity_lift
will be equal to its cardinality. This can also be explicitly requested by settingidentity:c
(c for cardinality). If a cell is created by multiple lifter functions, its final rank will be the minimum of all ranks assigned to it by each of its lifters.Collater
object which allows us to handle issues arising from lifting the cardinality constraint. Namely, we now pad all tensors whose names match eitherf'x_{i}'
orf'inv_{i}_{j}'
withtorch.nan
. These tensors hold the indices of nodes underlying cells or relationships between cells, sotorch.nan
is used to denote the absence of some nodes, allowing us to represent a cell with i nodes and another cell with j nodes, $i \neq j$ in the same tensor. This decision forced me to refactor places in the code where these tensors are used to index into other tensors, since the index tensors are nowtorch.FloatTensor
and containnan
values.Notes
pytorch_geometric
does not support (and even explicitly prevent) passing customcollate
functions to its dataloader class, we are no longer usingtorch_geometric.loader.DataLoader
. Instead, we are using the familiartorch.utils.data.DataLoader
. I subclassedtorch_geometric.loader.dataloader.Collater
in such a way that retains all of its original functionality needed fortorch_geometric
to perform batching properly but allows us to do perform pre- and post-collate operations.