HydPy / meetup-manager

Django app for automating Meetup Organising taks
MIT License
7 stars 4 forks source link

Database model #4

Closed shubham1172 closed 4 years ago

shubham1172 commented 4 years ago

Please go to the branch and see the markdown file - model.md. It shows how we can model our events and social media posts.

ananyo2012 commented 4 years ago

@shubham1172 Thanks for coming up with the proposal and apologies for the delayed reply. The database diagram looks nice and simple. The modified_on field was something that was missing in the last database diagram.

One point to note regarding the platform_metadata table is that there can be multiple posts for an event in each platform so the name of the table should actually be post_metadata . Also we need to differentiate between a post and a comment, so we need to introduce a type field. An example would be listing a meetup event post (which can be updated), so that if there is one post and one comment for the meetup event we can clearly differentiate what type is it.

On thinking a bit more, we really don't need the platform_metadata table if we are using the json data approach since when we will display it in the template we can write the deserializer in such a way that it shows a field only if that key exists. We can just move the platform_id field to the posts table.

shubham1172 commented 4 years ago

There can be multiple posts for each platform but the structure for each platform remains the same. platform_metadata describes how the data for each platform is structured. post_metadata would hint that the metadata is about each post, which is not really what we are doing. I guess the markdown isn't very clear in itself.

Each entry in platform_metadata describes how a post for each platform is structured. Based on this, the values are serialized/deserialized and stored in the posts table. Each post belongs to a platform (many-to-one) and is indicated by the FK platform_id.

To further differentiate a type, we know that all the platforms don't support comments. So we can have parent pointers in the supported types (say Meetup, Facebook, etc.) to show that they're comments. The parent pointer would be null for posts.

We can also have the fields created_by and modified_by in our tables for auditing reasons.

A metadata table is needed to define a structure to our data. Let's say a case, we want to create a new tweet. How do we know what fields are we expecting? How do we sanitize if a user gives an extra key in the JSON or misses a required key? Let's not rely on the user to provide sanitized data all the time, instead let's create a concrete structure for each data type, which is platform dependent.

ananyo2012 commented 4 years ago

@shubham1172 Thanks for the explanation, that makes it clear now. So the platform_metadata table acts like the validator for data stored in each post of a particular platform. It would be good to document this in the proposal.

Yes we should add created_by and modified_by in posts and event table.

For the parent child relationship between post and comment do you propose a new table for comments ?

shubham1172 commented 4 years ago

I think we can do with a single table, since rest of the content type will be almost the same. The parent_id will differentiate between a comment and a post.

ananyo2012 commented 4 years ago

I think we can do with a single table, since rest of the content type will be almost the same. The parent_id will differentiate between a comment and a post.

Sounds good. Can you update the database diagrams with the created_by, modified_by and parent_id fields ?

ananyo2012 commented 4 years ago

@shubham1172 Shouldn't parent_id be in posts table rather than in data ? I think it will be easier to filter posts directly from database that way instead of iterating over all posts.

Also please update the Entity Relationship diagram image with the updates.

shubham1172 commented 4 years ago

Since parent_id a platform-dependent attribute, it should ideally be a part of platform-data. For filtering, it'll be definitely efficient if the parent_id is in the posts table, but then we'd have to maintain NULL values even for the platforms which do not support it. What say?

I know it'll be a little costly to query platform data, but Postgres supports native querying/filtering of JSON data. You can see the documentation here https://www.postgresql.org/docs/9.5/datatype-json.html

EDIT: One more advantage of having parent_id in the post table is that it'll support a FK. It's not possible in the jsonb type AFAIK.

I'll update the Entity-Relationship diagram once we figure this out.

ananyo2012 commented 4 years ago

This sounds like an experiment we need to do. For querying fields in the app we will be using the Django ORM so it has to support multiple DBs i.e. sqlite for dev and Postgres for prod. I read the Django documentation and found it has support for Postgres JSONfield. For Sqlite we may have to install django-mysql module which supports JSONField for mysql database.

ananyo2012 commented 4 years ago

@shubham1172 Let's go with the present idea you proposed. Go ahead and Update the database image.

ananyo2012 commented 4 years ago

Nice this looks good. This is a well designed doc @shubham1172 . I have added reviewers to have another pair of eyes have a look.

inovizz commented 4 years ago

Thanks @shubham1172 :)