SMUG is a profiler which allows you to group social media users using big data. SMUG provides extra insight into tweets by running various kinds of analytics on it.
For example it can give a propibility about if a giving tweet is about someone being sick. This is accomplished by analysing the words in the tweet using deep learning.
In order to use the project the following dependencies have to be satisfied
pip install -r requirements.txt
python -m textblob.download_corpora
to enable the NLP functionality.
These include some third-party licenses.This project uses docker-compose thus to run the project you need to issue the command docker-compose up
.
This will pull all the required images and runs all the containers.
There are several things needed to run the project correctly:
.env
file. See resources/.env.example
for the needed parameters.Optional:
word2vec
model for use with WordVectoring analysis.
utils/word_vectoring_model_generator.py
can do this for you. You will need to place a wikipedia dump in the resources
folder and alter the script to use the correct name.While developing you can opt to run the python files locally on your system. This can be done by running the individual python files.
A run.py
is also which allows you to start multiple workers at once.
run.py
start all the workers necessary for basic processing.
Now all you need is data.
This can be done by running importers/coosto_importer.py
which will prompt you for a csv file.
This csv file is then put into SMUG.
This csv file should be located somewhere in the resources
folder
In order to ensure the docker environment contains the correct settings and queues it is important that the initializer.py
file is ran every time the docker environment is restarted.
After running the initializer the order of execution is not important.
When using run.py
this is done automatically for you.