THUDM / GATNE

Source code and dataset for KDD 2019 paper "Representation Learning for Attributed Multiplex Heterogeneous Network"
MIT License
525 stars 141 forks source link

Cosine similarity between generated vectors is close to 1.0? #42

Closed ruoyuli1995 closed 3 years ago

ruoyuli1995 commented 4 years ago

Hi, thank you for your excellent work!

I have a question about the cosine similarity between the generated vectors of the nodes.

For Amazon dataset and one dataset of my own, threshold(https://github.com/THUDM/GATNE/blob/master/src/main.py#L100) printed during training is always greater than 0.98, and the final similarity score between any two nodes is close to 1, most of which is greater than 0.95, which seems that the generated vectors are distributed in a limited area and makes me very confused.

I change nothing about the experiment setups for Amazon dataset.

Do you have any idea about this result? Thanks ahead!

cenyk1230 commented 4 years ago

Hi @ruoyuli1995,

Thanks for your attention to our work. I haven't reported the threshold value before and we mainly care about the relative similarity scores (e.g., AUC). What you discover is very interesting. I have no idea about it right now. Maybe we can analyze this phenomenon and tackle this issue later.

ruoyuli1995 commented 4 years ago

Yes, I find the generated vectors indeed make sense, by visualization of kmeans result. However, when I try to find thresholds to cut off, I find the similarity scores lack distinction.

Thank you for your reply and look forward to further findings.

former7 commented 4 years ago

same problem. is there any way to explain? loss function cause this?