facebookresearch / PyTorch-BigGraph

Generate embeddings from large-scale graph-structured data.
https://torchbiggraph.readthedocs.io/
Other
3.38k stars 449 forks source link

Could Pytorch BigGraph find similarities two edges away? #260

Open ahakanbaba opened 2 years ago

ahakanbaba commented 2 years ago

Consider two very simple graph configurations

1) Foo to Bar

image

In this simple graph foo_1 entity and foo_2 entity are more similar to each other than the foo_3 entity. Running with the following config, the model can detect that

{'background_io': False,
 'batch_size': 1000,
 'bias': False,
 'bucket_order': 'inside_out',
 'checkpoint_path': '/code/spatiotemporal_embeddings_overfit_foo_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/model_checkpoints',
 'checkpoint_preservation_interval': None,
 'comparator': 'dot',
 'dimension': 32,
 'disable_lhs_negs': False,
 'disable_rhs_negs': False,
 'distributed_init_method': None,
 'distributed_tree_init_order': True,
 'dynamic_relations': False,
 'edge_paths': ['/code/spatiotemporal_embeddings_overfit_foo_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/converted_edges/all_edges_raw_edges_2022-05-25_2022-05-25'],
 'entities': {'bar': {'dimension': None,
                      'featurized': False,
                      'num_partitions': 1},
              'foo': {'dimension': None,
                      'featurized': False,
                      'num_partitions': 1}},
 'entity_path': '/code/spatiotemporal_embeddings_overfit_foo_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/entities',
 'eval_fraction': 0.0,
 'eval_num_batch_negs': 1000,
 'eval_num_uniform_negs': 1000,
 'global_emb': False,
 'half_precision': False,
 'hogwild_delay': 2.0,
 'init_path': None,
 'init_scale': 0.001,
 'loss_fn': 'softmax',
 'lr': 0.1,
 'margin': 0.1,
 'max_edges_per_chunk': 1000000000,
 'max_norm': 1.0,
 'num_batch_negs': 50,
 'num_edge_chunks': None,
 'num_epochs': 25,
 'num_gpus': 0,
 'num_groups_for_partition_server': 16,
 'num_machines': 1,
 'num_partition_servers': -1,
 'num_uniform_negs': 25,
 'regularization_coef': 0.0,
 'regularizer': 'N3',
 'relation_lr': None,
 'relations': [{'all_negs': False,
                'lhs': 'foo',
                'name': 'foo_to_bar',
                'operator': 'none',
                'rhs': 'bar',
                'weight': 1.0}],
 'verbose': 1,
 'workers': 24}

If I calculate the similarities of all entities to all other entities, I get something like the following

similarities = embeddings.dot(embeddings.transpose())
image

The blue background shows the more similar entities. The model was able to detect that bar_1 and bar_2 are more similar than to bar_3. Also foo_1 and foo_2 are more similar than foo_3.

image

2) Foo to baz to bar

image

We just add single indegree single outdegree entities (baz) between the foo and bar entities. The similarities now stem from 2 edges apart.

According to my understanding, Pytorch BigGraph cannot detect the similarities between the foo_1 and foo_2 entities compared to foo_3 entity in this configuration. The training config is the same as the previous example.

{'background_io': False,
 'batch_size': 1000,
 'bias': False,
 'bucket_order': 'inside_out',
 'checkpoint_path': '/code/spatiotemporal_embeddings_overfit_foo_to_baz_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/model_checkpoints',
 'checkpoint_preservation_interval': None,
 'comparator': 'dot',
 'dimension': 32,
 'disable_lhs_negs': False,
 'disable_rhs_negs': False,
 'distributed_init_method': None,
 'distributed_tree_init_order': True,
 'dynamic_relations': False,
 'edge_paths': ['/code/spatiotemporal_embeddings_overfit_foo_to_baz_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/converted_edges/all_edges_raw_edges_2022-05-25_2022-05-25'],
 'entities': {'bar': {'dimension': None,
                      'featurized': False,
                      'num_partitions': 1},
              'baz': {'dimension': None,
                      'featurized': False,
                      'num_partitions': 1},
              'foo': {'dimension': None,
                      'featurized': False,
                      'num_partitions': 1}},
 'entity_path': '/code/spatiotemporal_embeddings_overfit_foo_to_baz_to_bar/2022-05-25_2022-05-25/pytorch_big_graph/entities',
 'eval_fraction': 0.0,
 'eval_num_batch_negs': 1000,
 'eval_num_uniform_negs': 1000,
 'global_emb': False,
 'half_precision': False,
 'hogwild_delay': 2.0,
 'init_path': None,
 'init_scale': 0.001,
 'loss_fn': 'softmax',
 'lr': 0.1,
 'margin': 0.1,
 'max_edges_per_chunk': 1000000000,
 'max_norm': 1.0,
 'num_batch_negs': 50,
 'num_edge_chunks': None,
 'num_epochs': 25,
 'num_gpus': 0,
 'num_groups_for_partition_server': 16,
 'num_machines': 1,
 'num_partition_servers': -1,
 'num_uniform_negs': 25,
 'regularization_coef': 0.0,
 'regularizer': 'N3',
 'relation_lr': None,
 'relations': [{'all_negs': False,
                'lhs': 'foo',
                'name': 'foo_to_baz',
                'operator': 'none',
                'rhs': 'baz',
                'weight': 1.0},
               {'all_negs': False,
                'lhs': 'baz',
                'name': 'baz_to_bar',
                'operator': 'none',
                'rhs': 'bar',
                'weight': 1.0}],
 'verbose': 1,
 'workers': 24}

Calculating the similarities in the same fashion does not show the same similarity between bar_1 and bar_2 also between foo_1 and foo_2.

image image

I wonder is that a fundamental limitation in the Pytorch Big Graph, or maybe my training config is wrong for this type of a similarities detection.

Any advice is appreciated.

Steps to reproduce

Running the above training examples.

Observed Results

In the second example, the entity similarities stemming from 2 edges apart were not detected.

Expected Results

I expected the dot product of bar_1 and bar_2 to be reasonably larger than the dot product of bar_1 and bar_3 in the second example.

Relevant Code

Shared the training configs and graph configurations above.

adamlerer commented 2 years ago

This is an interesting experiment, thanks for bringing it up. I think the short answer is that you will typically get the desired behavior through low-rank generalization, but your dataset is small enough that you overfit the solution exactly, and the overfit solution does not produce this similarity.

Consider the (softmax) loss you actually use for some edge e:

L(e) = -log { e^score(e) / sum_e' score(e') }

This is minimized when the score for true edges is much higher than scores for non-existent edges. Since the graph here is so small you can achieve that for every edge. You can embed a cycle of length N in N dimension as follows:

bar_1 = [1, 1, 0, 0, 0, 0, 0, 0]
baz_1 = [0, 1, 1, 0, 0, 0, 0, 0]
foo_1 = [0, 0, 1, 1, 0, 0, 0, 0]
...
baz_3 = [1, 0, 0, 0, 0, 0, 0, 1]

If you make the graph much higher rank than the embedding dimension then I think (not 100% sure) you will get the desired behavior.