fani-lab / SEERa

A framework to predict the future user communities in a text streaming social network based on the users’ topics of interest.
Other
4 stars 5 forks source link

Setup and Quickstart of SEERa #45

Closed farinamhz closed 1 year ago

farinamhz commented 2 years ago

@hosseinfani @soroush-ziaeinejad This is an issue page to log my progress for setup and quickstart of SEERa.

farinamhz commented 2 years ago

@hosseinfani @soroush-ziaeinejad

As a log of progress, the setup part of SEERa is finished by me.

  1. Conda env has been activated

  2. Also, additional libraries listed below are installed successfully:

    • MAchine Learning for LanguagE Toolkit (mallet)
    • Gibbs Sampling algorithm for a Dirichlet Mixture Model (GSDMM)
    • DynamicGem
hosseinfani commented 2 years ago

@farinamhz thanks. are you able to run seera on the toy.synthetic?

farinamhz commented 2 years ago

I tried to run seera on the toy.synthetic. However, it gives the following error. It has two problems: First, graphs.pkl has not been created in all the levels Second, there is a problem with len_users that is not defined

@soroush-ziaeinejad could you please take a look?


Running pipeline for lda.gensim and dynae ....

1. DAL: Temporal Document Creation from Social Posts ...
##################################################
1.1. Loading saved temporal documents from  ../output/farinam-test/lda.gensim.dynae/Documents.csv in which 
(User, Time) a document is concat of user's posts in each 1 day(s)...
(#ProcessedDocuments, #Documents, #Users, #TimeIntervals): (180,180,60,3)
Time Elapsed: 0.022129297256469727

2. TML: Topic Modeling ...
##################################################
loading LdaModel object from ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynae/tml/3Topics.model'}
loading expElogbeta from ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model.expElogbeta.npy with mmap=None
setting ignored attribute id2word to None
setting ignored attribute dispatcher to None
setting ignored attribute state to None
loaded ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model
loading LdaState object from ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model.state
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynae/tml/3Topics.model.state'}
loaded ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model.state
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynae/tml/3Topics.model.id2word'}
2.1. Loading saved topic model of lda.gensim from ../output/farinam-test/lda.gensim.dynae/tml/3TopicsDictionary.mm and ../output/farinam-test/lda.gensim.dynae/tml/3Topics.model ...
loading Dictionary object from ../output/farinam-test/lda.gensim.dynae/tml/3TopicsDictionary.mm
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynae/tml/3TopicsDictionary.mm'}
loaded ../output/farinam-test/lda.gensim.dynae/tml/3TopicsDictionary.mm
Time Elapsed: 0.004091501235961914

3. UML: Temporal Graph Creation ...
##################################################
3.1. Loading users' graph stream from ../output/farinam-test/lda.gensim.dynae/uml/graphs/graphs.pkl ...
3.1. Loading users' graph stream failed! Generating the graph stream ...
60 users have twitted in 2010-12-01
UserSimilarity: UsersTopicInterests.npy is saved for day:2010-12-01 with shape: (60, 3)
UserSimilarity: A graph is being created for day: 2010-12-01 with 60 users
Traceback (most recent call last):
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 92, in main
    graphs = pd.read_pickle(path)
  File "C:\Users\Farinam\anaconda3\envs\seera\lib\site-packages\pandas\io\pickle.py", line 169, in read_pickle
    f, fh = get_handle(fp_or_buf, "rb", compression=compression, is_text=False)
  File "C:\Users\Farinam\anaconda3\envs\seera\lib\site-packages\pandas\io\common.py", line 499, in get_handle
    f = open(path_or_buf, mode)
FileNotFoundError: [Errno 2] No such file or directory: '../output/farinam-test/lda.gensim.dynae/uml/graphs/graphs.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 160, in run
    main()
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 99, in main
    just_one=Params.tml['justOne'], binary=Params.tml['binary'], threshold=Params.tml['threshold'])
  File "C:\Users\Farinam\PycharmProjects\SEERa\src\uml\UserSimilarities.py", line 46, in main
    cmn.logger.info(f'UserSimilarity: Number of users per day: {len_users}')
NameError: name 'len_users' is not defined

Running pipeline for lda.gensim and dynaernn ....

1. DAL: Temporal Document Creation from Social Posts ...
##################################################
1.1. Loading saved temporal documents from  ../output/farinam-test/lda.gensim.dynaernn/Documents.csv in which 
(User, Time) a document is concat of user's posts in each 1 day(s)...
(#ProcessedDocuments, #Documents, #Users, #TimeIntervals): (180,180,60,3)
Time Elapsed: 0.009044170379638672

2. TML: Topic Modeling ...
##################################################
loading LdaModel object from ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model'}
loading expElogbeta from ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model.expElogbeta.npy with mmap=None
setting ignored attribute id2word to None
setting ignored attribute dispatcher to None
setting ignored attribute state to None
loaded ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model
loading LdaState object from ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model.state
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model.state'}
loaded ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model.state
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model.id2word'}
2.1. Loading saved topic model of lda.gensim from ../output/farinam-test/lda.gensim.dynaernn/tml/3TopicsDictionary.mm and ../output/farinam-test/lda.gensim.dynaernn/tml/3Topics.model ...
loading Dictionary object from ../output/farinam-test/lda.gensim.dynaernn/tml/3TopicsDictionary.mm
{'transport_params': None, 'compression': 'infer_from_extension', 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'rb', 'uri': '../output/farinam-test/lda.gensim.dynaernn/tml/3TopicsDictionary.mm'}
loaded ../output/farinam-test/lda.gensim.dynaernn/tml/3TopicsDictionary.mm
Time Elapsed: 0.0031180381774902344

3. UML: Temporal Graph Creation ...
##################################################
3.1. Loading users' graph stream from ../output/farinam-test/lda.gensim.dynaernn/uml/graphs/graphs.pkl ...
3.1. Loading users' graph stream failed! Generating the graph stream ...
60 users have twitted in 2010-12-01
UserSimilarity: UsersTopicInterests.npy is saved for day:2010-12-01 with shape: (60, 3)
UserSimilarity: A graph is being created for day: 2010-12-01 with 60 users
Traceback (most recent call last):
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 92, in main
    graphs = pd.read_pickle(path)
  File "C:\Users\Farinam\anaconda3\envs\seera\lib\site-packages\pandas\io\pickle.py", line 169, in read_pickle
    f, fh = get_handle(fp_or_buf, "rb", compression=compression, is_text=False)
  File "C:\Users\Farinam\anaconda3\envs\seera\lib\site-packages\pandas\io\common.py", line 499, in get_handle
    f = open(path_or_buf, mode)
FileNotFoundError: [Errno 2] No such file or directory: '../output/farinam-test/lda.gensim.dynaernn/uml/graphs/graphs.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 160, in run
    main()
  File "C:/Users/Farinam/PycharmProjects/SEERa/src/main.py", line 99, in main
    just_one=Params.tml['justOne'], binary=Params.tml['binary'], threshold=Params.tml['threshold'])
  File "C:\Users\Farinam\PycharmProjects\SEERa\src\uml\UserSimilarities.py", line 46, in main
    cmn.logger.info(f'UserSimilarity: Number of users per day: {len_users}')
NameError: name 'len_users' is not defined
hosseinfani commented 2 years ago

@farinamhz could you debug and find the problem?

farinamhz commented 2 years ago

I defined the variable len_users = len(unique_users). However, in the next run, I got a similar error for a variable named limit. Now, I am not sure about the limit value to put there.

NameError: name 'limit' is not defined

@soroush-ziaeinejad could you please take a look at this one?

soroush-ziaeinejad commented 2 years ago

@farinamhz It seems you could properly install SEERa! There is a minor bug regarding your issue in the codebase. I will fix it by the end of tonight and will let you know. Thanks

soroush-ziaeinejad commented 2 years ago

@farinamhz You should be able to run the code now. Please pull and try again. Let me know if you face any issues. Thanks.

farinamhz commented 2 years ago

Thank you @soroush-ziaeinejad. I tried to run it and it was ok. Now, I have a question about a metric which is success at 100. First, I saw the result of it on toy synthetic data and when I changed the part related to this from the user dictionary keys which was 3 to 100, the result did not change and it just copied the third success to all the rest of it till 100 instead of growing to 1. I copied the result below: 1-

,success_1,success_2,success_3
lda.mallet.dynaernn,0.05084745762711865,0.1016949152542373,0.15254237288135594

2-

,success_1,success_2,success_3,success_4,success_5,success_6,success_7,success_8,success_9,success_10,success_11,success_12,success_13,success_14,success_15,success_16,success_17,success_18,success_19,success_20,success_21,success_22,success_23,success_24,success_25,success_26,success_27,success_28,success_29,success_30,success_31,success_32,success_33,success_34,success_35,success_36,success_37,success_38,success_39,success_40,success_41,success_42,success_43,success_44,success_45,success_46,success_47,success_48,success_49,success_50,success_51,success_52,success_53,success_54,success_55,success_56,success_57,success_58,success_59,success_60,success_61,success_62,success_63,success_64,success_65,success_66,success_67,success_68,success_69,success_70,success_71,success_72,success_73,success_74,success_75,success_76,success_77,success_78,success_79,success_80,success_81,success_82,success_83,success_84,success_85,success_86,success_87,success_88,success_89,success_90,success_91,success_92,success_93,success_94,success_95,success_96,success_97,success_98,success_99,success_100
lda.mallet.dynaernn,0.05084745762711865,0.1016949152542373,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594,0.15254237288135594

Second, If you have time I would be grateful if we have a meeting to talk about the layers and metrics because I did not understand them completely.

soroush-ziaeinejad commented 2 years ago

Hi @farinamhz ,

Thanks for the report. I think we can talk about the first issue in our meeting as well. So, let me know of your availability and we can schedule a meeting then. Maybe tomorrow or Monday in the lab if it's okay.

farinamhz commented 2 years ago

Sorry for the late reply @soroush-ziaeinejad I will be in the lab on Monday from noon onwards.