fani-lab / SEERa

A framework to predict the future user communities in a text streaming social network based on the users’ topics of interest.
Other
4 stars 5 forks source link

Clean code #14

Closed soroush-ziaeinejad closed 2 years ago

soroush-ziaeinejad commented 2 years ago

Added CSV data Code structure now makes more sense Plug and play modules All adjustable parameters are now are in params.py separated by their related layer

hosseinfani commented 2 years ago

@soroush-ziaeinejad While I was refactoring the code, I noted the following issues/concerns. You may fix them already. Please have a look and let me know what do you think.

DataPreperation.py line 42#Hossein: what if there is no concept identified for a post => filter empty strings

check other configs: tagme time only => there is no userid => the pipeline breaks at uml, gel, ... no user no time

usersimilarity.py line 35 => is it inference or look up? usersimilarity.py line 35 => what if we have document = post config? usersimilarity.py line 39 => dense zero matrix!

soroush-ziaeinejad commented 2 years ago

@hosseinfani Thanks for your comments.

1- We do not add a post with empty string after preprocessing, filtering, and removing stopwords. 2- Tagme, time-only, and user-only configs have been already fixed and tested. 3- Due to the massive changes in usersimilarity.py I cannot follow which lines you mean. We can discuss it in person.

Thanks.