HKUST-KnowComp / FMG

KDD17_FMG
138 stars 55 forks source link

How to run the data #2

Closed fanyike closed 6 years ago

fanyike commented 7 years ago

Please tell me how to run the data. Should I download the data myself?

SeverusBing commented 7 years ago

I have the same question and may I ask which one is the main part of the algorithm? Should I perform all the python profile? So sorry to raise such questions but I never deal with python before...

hzhaoaf commented 7 years ago

@fanyike @SeverusBing Thanks for your interests in this work. The readme file is updated. You may check it for obtaining Yelp-50K and Amazon-50K.

Feel free to post here if you have any questions.

SeverusBing commented 7 years ago

Thank you so much for your reply ! I just try to open the website you mentioned in the readme file, but it seems to be unavailable...

hzhaoaf commented 7 years ago

@SeverusBing What do you mean by "unavailable"? It's a dropbox link, which you can download the dataset. In fact, I ask several people to test and the link is OK.

SeverusBing commented 7 years ago

Sorry! Maybe there was something wrong with my internet yesterday, now I get the data package, thank you!

SeverusBing commented 7 years ago

I'm sorry to disturb you again...I put the data folder in the project directory and performed the command as the example "python run_exp.py config/yelp-50k.yaml -reg 0.5", but I got a FileNotFoundError: No such file or directory: 'D:\FMG-master\log\fmg_yelp-50k_vary_reg_split1.log', I checked the code of run_exp.py and found the set_logfile function may be related to this error, including the sentences:

logfilename = 'log/fmg%s_%s_split%s.log' % (config['dt'], config['exp_type'], config['sn']) if config['exp_type'] == 'vary_mg': logfilename = 'log/fmg%s_%s_split%s_reg%s.log' % (config['dt'], config['exp_type'], config['sn'], config['reg'])"

May I ask what this function is for? And what is a 'fmg_yelp-50k_vary_reg_split1.log', is it a data file? Should I rename these files in the data folder as the form "fmg_yelp-50k_vary_reg_split1"? Besides, I didn't find a file in .log format in neither the data folder nor the code folder...

hzhaoaf commented 7 years ago

@SeverusBing How about make a directory named "log" in the project directory?

SeverusBing commented 7 years ago

A "log" directory with nothing in it? OK I will have a try, so the "fmg_yelp-50k_vary_reg_split1.log" is a result document?

hzhaoaf commented 7 years ago

@SeverusBing It's a log file that records the information that the program output when running, you can test and have a look then:-)

SeverusBing commented 7 years ago

I see! Thank you so much!

fanyike commented 7 years ago

@PhoenixZhao Thanks! And If my rating has continuous number, I mean, the label is continuous, can I use your code?

hzhaoaf commented 7 years ago

@fanyike I think so. You can try it then.

fanyike commented 7 years ago

@PhoenixZhao Do you mean that this code can be used to continuous rating?

hzhaoaf commented 7 years ago

@fanyike Yes, this code can run no matter the rating is discrete or continuous.

fanyike commented 7 years ago

Thanks for your reply! Excellent work.

fanyike commented 7 years ago

@PhoenixZhao Any introduce about have to use custom data. The yelp data format is very complicate

hzhaoaf commented 7 years ago

@fanyike Actually my code can run without the Yelp format, and I have preprocessed the data for my code. You can look at the detail of the data I released.

SeverusBing commented 7 years ago

Sorry to bother you again, could you tell me the meaning of these data in the mf_features/path_count folder? I noticed that there are 11 columns in each data file, but I can't find out what these columns represent, as I see the file name, such as "UNBUB_top500_item", I guess the "UNBUB" is a meta-path or meta-graph with five node, so I'm confused about why there are 11 colunms in the data file. Please kindly give some inspiration to me, thank you very much!

hzhaoaf commented 7 years ago

@SeverusBing In general, these files record the latent features obtained from the corresponding meta-graphs(read the paper for the details). And in our experiments, we set the rank of MF to 10, thus we obtain 10-dimensional vector for users and items. The first column represents the id of user or item, and the remaining 10 represents the latent features then.

SeverusBing commented 7 years ago

@PhoenixZhao Oh, I see! Then, are these matrices we get in the fm_res folder the results of the code? Would you mind telling me what the P or W in the file name means? Such as "yelp-50k_split1_P" or "yelp-50k_split1_W", thanks!

hzhaoaf commented 7 years ago

@SeverusBing W and P represent the variable W and V in the factorization machine. Here I used P to denote V because of some coding problems. Sorry for the inconvenience.

SeverusBing commented 7 years ago

@PhoenixZhao Got it! Thanks soooooo much! But I wonder why there is only one column in each W matrix? Cause I knew W is the first-order weights for features, and I noticed that both of the matrix W and P have 240 rows, so I'm confused about how to recover the rating matrix to get the recommendation results...