OUTPUT FILES - Githubissues

FedericoCinus commented 5 years ago

Introduce json format (see #12 )
Create a _ModelData class for saving all necessary data (e.g. topic_distributions) class _ModelData: def init(self, val=2): self.topic_distrib = None def getstate(self): return self.dict def setstate(self, d): print("I'm being unpickled with these values:", d) self.dict = d

class TopicModel: def init(self): self.model_data = _ModelData()

FedericoCinus commented 5 years ago

Now there are two methods: one for saving all the class in a pickle file, one for saving necessary data (topic_distrib, graph et cetera) in a txt file.

FedericoCinus commented 5 years ago

@francescobonchi

No dictioraries in the output files.
All the LDA topic model output in files (topic distrib and word distrib)

corradomonti commented 5 years ago

I think that the remaining point that this issue deals with, that are not dealt by issue #22 and issue #19 are:

[ ] Provide a new output format as simple CSV or TSV file with the simple t, i, u propagations (this is what @francescobonchi referred to?)
[ ] Let us minimize the output files (in cases where the user might want some output, but maybe not, let's provide them with the option, with a default to "no output")
[ ] Drop the Hidden_ attribute hack

FedericoCinus commented 5 years ago

From: https://docs.google.com/document/d/1gwRsRCDZaxASy6suelBOH1vv0McoRzTKmN4LPmKyEWw/edit?ts=5cf806d0

Main output files: [out1] items_description [out2] items_keywords (optional) [out3] users_interests (optional) [out4] users_influence (optional) [out5] propagations [out6] topic_model (i.e., LDA output) (when the user passes docs folder as input)

Format of the output files: [out1] item_id_int [k-array of probabilities] [out2] item_id_int (bag of words) #19 [out3] user_id_int [k-array of probabilities] [out4] user_id_int [k-array of probabilities] [out5] time_stamp_int item_id_int user_id_int [out 6] TBD (I would just pick the standard output format of LDA)

FedericoCinus commented 5 years ago

Code for output saver:

Schermata 2019-06-06 alle 15 46 25

FedericoCinus commented 5 years ago

All the attributes which are going to be outputs are stored in dictionaries. This implies that the keys are not stored in a precise order (if we are talking about numerical keys). For example if I write on a file the users interests it shows this: Users_interests_sim8.txt

corradomonti commented 5 years ago

If that's important maybe a list is better than a dict then (see this comment)

FedericoCinus commented 5 years ago

I do not know if it is important. In this particular case the dict format comes from the Gensim library from whom the node2vec module takes the embeddings.

FedericoCinus commented 5 years ago

Class Saver has been introduced for saving outputs. All the non-optional outputs can be correctly saved in the correct format. I have to remove "Hidden_" keyword from diffusion class and solve #19.

commit 7aae7fff2835628144afdc1a5d23f5c57fd2162c

FedericoCinus commented 5 years ago

All outputs have been inserted. Topic_model output is now called Topics_descript. commit bc2e102be196797dd382c9fd7794a7e01edf6f96

FedericoCinus / WoMG

OUTPUT FILES #20