Mapping of Embeddings to CSV Rows

heindorf commented 4 years ago

In order to evaluate the clustering in terms of type prediction, we need to be able to map an embedding to a row in the original CSV file.

Currently, the embedding results look as follows and it is not clear how to map an embedding to the original row in the CSV file.

Unnamed: 0	0	1	2	3	4	5	6	7	8	...	40	41	42	43	44	45	46	47	48	49
Event_0	0.999795	0.506241	0.999795	0.506241	0.506241	0.999795	0.506241	0.506241	0.999795	...	0.999795	0.999795	0.999795	0.999795	0.506241	0.506241	0.999795	0.999795	0.999795	0.265385
customer_name	0.999795	0.509722	0.999795	0.509722	0.509722	0.999795	0.509722	0.509722	0.999795	...	0.999795	0.999795	0.999795	0.999795	0.509722	0.509722	0.999795	0.999795	0.999795	0.265371
customer_b	0.999795	0.506811	0.999795	0.506811	0.506811	0.999795	0.506811	0.506811	0.999795	...	0.999795	0.999795	0.999795	0.999795	0.506811	0.506811	0.999795	0.999795	0.999795	0.265371

ghost commented 4 years ago

@heindorf Also, I found each Event has different rows' labels. Check rows 71 and 82 for Events 1&2 in 0.1-sh-ai4bd-embeddings.ipynb

Demirrr commented 3 years ago

The issue is unclear to me.

In order to evaluate the clustering in terms of type prediction, we need to be able to map an embedding to a row in the original CSV file.

I would claim that this is not needed as theoretically described in here and pratically shown in TypePrediction Class and Pipeline Pipeline.

That being said, given a row in the CSV file corresponds to many triples based on our idea of conversion, how do you envision a possible mapping from vector representations to the original CSV file?

Cheers

heindorf commented 3 years ago

Example to clarify the correspondence between the input csv file and the computed embeddings.

input.csv

col1	col2
a	b
b	a

Triples in Knowledge Graph

\<Event_0, col1, a> \<Event_0, col2, b> \<Event_1, col1, a> \<Event_1, col2, b>

embeddings.csv

index	embedding
Event_0	\<embedding1>
Event_1	\<embedding2>
col1	\<embedding3>
col2	\<embedding4>
a	\<embedding5>
b	\<embedding6>

"Event_0" corresponds to the first row in input.csv and "Event_1" corresponds to the second row. The order of rows in embeddings.csv has no meaning and only the indices are relevant.

dice-group / vectograph