THUDM / GATNE

Source code and dataset for KDD 2019 paper "Representation Learning for Attributed Multiplex Heterogeneous Network"
MIT License
525 stars 141 forks source link

when i set schema,it goes wrong #45

Open YEA02 opened 4 years ago

YEA02 commented 4 years ago

Take only 300 person-item data for experiments, set the meta path to person-item-person, the number of walks and length are the default values, but this will give an error line 56 in main.py iy = vocab [y] .index KeyError: '1006114' I printed "all_walks" , found that it did not walk to this node '1006114' Is there something wrong with me?Looking forward to your reply

YEA02 commented 4 years ago

Because this may be the case: Some nodes in G cannot be reached when walking along the specified meta path, so this method generate_vocab in utils.py, the vocab generated according to the number of node occurrences in all_walks, does not include all the nodes in G

wuqianliang commented 4 years ago

I had also met the same problem as you found @yathe

wuqianliang commented 4 years ago
    for (x, y) in g:
    ##    if x in vocab and y in vocab:
            ix = vocab[x].index
            iy = vocab[y].index
            # N(i,r)
            neighbors[ix][r].append(iy)
            neighbors[iy][r].append(ix)

add one line leading by ## may solve the error @yathe

YEA02 commented 4 years ago

My modification way is in 'generate_vocab': One more parameter ‘all_nodes’ was passed, and the following code was added.

for i in all_nodes:
        if i not in raw_vocab.keys():
            raw_vocab[i] = 0

When generating vocab, assign a value of 0 to the nodes that did not walk. I don’t know if it is not suitable for this modification, and the effect is not very good. @wuqianliang

wuqianliang commented 4 years ago

My modification way is in 'generate_vocab': One more parameter ‘all_nodes’ was passed, and the following code was added.

for i in all_nodes:
        if i not in raw_vocab.keys():
            raw_vocab[i] = 0

When generating vocab, assign a value of 0 to the nodes that did not walk. I don’t know if it is not suitable for this modification, and the effect is not very good. @wuqianliang

this code part used to get neighborhood of node i on edge type r over some walk paths. So vocab generated from walks only were enough.

YEA02 commented 4 years ago

thank you very much! I still have a question, is it impossible to get the embbeding of the nodes that have not been walked? @wuqianliang

wuqianliang commented 4 years ago

thank you very much! I still have a question, is it impossible to get the embbeding of the nodes that have not been walked? @wuqianliang

according formula (6), only "bi" base embedding part for nodes that have not been walked

YEA02 commented 4 years ago

Thank you very much for your answer! @wuqianliang

wuqianliang commented 4 years ago

thank you very much! I still have a question, is it impossible to get the embbeding of the nodes that have not been walked? @wuqianliang

need to modify code according to formula (13) , if there is unobserved data in test dataset. this pytorch code still transductive.

YEA02 commented 4 years ago

1.I use tensorflow version. I have unobserved data in my test set, but I don't know how to modify it ? 2.The "final_model" only contains all the nodes that have walked. If the node has not walked, how should it output its final embedding?

@wuqianliang

lalw commented 4 years ago

maybe i want to ask how to set the number of information aggregation layers ? @wuqianliang

DrQinZL commented 2 years ago

set the meta path to person-item-person

I solve this problem by setting the meta path to 'person-item-person, item-person-item'