Closed heindorf closed 3 years ago
@heindorf Also, I found each Event has different rows' labels. Check rows 71 and 82 for Events 1&2 in 0.1-sh-ai4bd-embeddings.ipynb
The issue is unclear to me.
In order to evaluate the clustering in terms of type prediction, we need to be able to map an embedding to a row in the original CSV file.
I would claim that this is not needed as theoretically described in here and pratically shown in TypePrediction Class and Pipeline Pipeline.
That being said, given a row in the CSV file corresponds to many triples based on our idea of conversion, how do you envision a possible mapping from vector representations to the original CSV file?
Cheers
Example to clarify the correspondence between the input csv file and the computed embeddings.
col1 | col2 |
---|---|
a | b |
b | a |
\<Event_0, col1, a> \<Event_0, col2, b> \<Event_1, col1, a> \<Event_1, col2, b>
index | embedding |
---|---|
Event_0 | \<embedding1> |
Event_1 | \<embedding2> |
col1 | \<embedding3> |
col2 | \<embedding4> |
a | \<embedding5> |
b | \<embedding6> |
"Event_0" corresponds to the first row in input.csv
and "Event_1" corresponds to the second row. The order of rows in embeddings.csv
has no meaning and only the indices are relevant.
In order to evaluate the clustering in terms of type prediction, we need to be able to map an embedding to a row in the original CSV file.
Currently, the embedding results look as follows and it is not clear how to map an embedding to the original row in the CSV file.