Closed FedericoCinus closed 5 years ago
Now there are two methods: one for saving all the class in a pickle file, one for saving necessary data (topic_distrib, graph et cetera) in a txt file.
@francescobonchi
I think that the remaining point that this issue deals with, that are not dealt by issue #22 and issue #19 are:
t, i, u
propagations (this is what @francescobonchi referred to?)Hidden_
attribute hackFrom: https://docs.google.com/document/d/1gwRsRCDZaxASy6suelBOH1vv0McoRzTKmN4LPmKyEWw/edit?ts=5cf806d0
Main output files: [out1] items_description [out2] items_keywords (optional) [out3] users_interests (optional) [out4] users_influence (optional) [out5] propagations [out6] topic_model (i.e., LDA output) (when the user passes docs folder as input)
Format of the output files: [out1] item_id_int [k-array of probabilities] [out2] item_id_int (bag of words) #19 [out3] user_id_int [k-array of probabilities] [out4] user_id_int [k-array of probabilities] [out5] time_stamp_int item_id_int user_id_int [out 6] TBD (I would just pick the standard output format of LDA)
Code for output saver:
All the attributes which are going to be outputs are stored in dictionaries. This implies that the keys are not stored in a precise order (if we are talking about numerical keys). For example if I write on a file the users interests it shows this: Users_interests_sim8.txt
If that's important maybe a list is better than a dict then (see this comment)
I do not know if it is important. In this particular case the dict format comes from the Gensim library from whom the node2vec module takes the embeddings.
Class Saver has been introduced for saving outputs. All the non-optional outputs can be correctly saved in the correct format. I have to remove "Hidden_" keyword from diffusion class and solve #19.
commit 7aae7fff2835628144afdc1a5d23f5c57fd2162c
All outputs have been inserted. Topic_model output is now called Topics_descript. commit bc2e102be196797dd382c9fd7794a7e01edf6f96
class TopicModel: def init(self): self.model_data = _ModelData()