dsc-umass / social-insights

A search engine to query social media insights with political theme
GNU General Public License v3.0
10 stars 4 forks source link

Database for Project #43

Open abhinavtripathy opened 4 years ago

abhinavtripathy commented 4 years ago

We need a database solution for the search engine. Something that is scalable and fast. Some notes

abhinavtripathy commented 4 years ago

@kevinmsmith131 There are many tasks for you to get started with.

kevinmsmith131 commented 4 years ago

Requirements Specification:

There will be a Tweets database that stores a tweet, the tweet id, the location the tweet was made from, the user that posted the tweet, the user's gender, the user's age, and the user's ethnicity. This database will be a SQL relational database where each row is a tweet with all the information for that tweet in the same row.

The database that stores search queries will be a NoSQL Graph Database. A graph database will allow us to connect queries to each other if they share a common theme, and if the number of connections is high for a query, then it will be labeled as a trending query.

The database that stores analytics will be a SQL relational database in which each row is a search query and in that row are all the analytics that are collected for that query. The analytics to be stored are if the query, if the query returned any results, if the results that the query are useful (which would require the user to be asked this for feedback), and if the information found is a credible source (may also require user feedback).

The caching database will use Redis and will use the cache-aside caching strategy. The database will be a relational database that stores frequent tweets, frequent search queries, and frequent analytics.

kevinmsmith131 commented 4 years ago

Data Set:

TWEETS (tweet_id, tweet, tweet time, tweet date, tweet_location, username, user_gender, user_age, user_ethnicity) SEARCH_QUERIES (query_id, search_query, query time, query date) ANALYTICS (query_id, search_query, query time, query date, trending_status, returned_results, useful_results, credible_information) CACHE (frequent_tweet, frequent_query, frequent_analytic)

kevinmsmith131 commented 4 years ago

Conceptual Data Model: Screenshot from 2020-07-23 16-33-08

AdiNar1106 commented 4 years ago

Great job on getting the database architecture together, it looks great and pretty robust! Can you resend that drive link, it doesn't seem to be working

kevinmsmith131 commented 4 years ago

Ya the link wasn't working for me either so I ended up just screenshotting and pasting it here.

kevinmsmith131 commented 4 years ago

Logical Data Model: Social Insights Logical Data Model

abhinavtripathy commented 4 years ago

Amazing work @kevinmsmith131

kevinmsmith131 commented 4 years ago

Thanks @abhinavtripathy!

abhinavtripathy commented 4 years ago

@kevinmsmith131 could you go ahead and create a branch on the repo and start to put all this code down into files, that way we could look at code reviews more easily.