edgi-govdata-archiving / web-monitoring

Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")
Creative Commons Attribution Share Alike 4.0 International
105 stars 17 forks source link

Designing Tentative Database Schema #21

Closed ChaiBapchya closed 7 years ago

ChaiBapchya commented 7 years ago

Having gone through the Architecture, I realized the "Database Schema are unknown" is a glaring hole that needs to be sorted. Would like to work on creating the same. Decisions to be taken What type of model to be chosen Eg.Entity relationship model

What DB to be handled Relational, NoSQL, Hadoop / Spark (Big Data)

Basic template (that quickly comes to mind)

Page name Website (to which page belongs( Page id (unique identifier) Previous state Current state List of previous states

dcwalk commented 7 years ago

@ChaiBapchya thanks for adding your thoughts! We do have a Schema, unfortunately it isn't super well documented yet --

We have a Pull Request open from the arch-overview branch where the schema is documented: https://github.com/edgi-govdata-archiving/web-monitoring/blob/arch-overview/README.md#schema

I'm going to work on getting this merged in asap so we can address your qs!

ChaiBapchya commented 7 years ago

Alright sure. Thanks

dcwalk commented 7 years ago

The documentation is now live! I'm going to close this issue as we have our preliminary schema, @ChaiBapchya we could work on ideas you have based on the schema in #gsoc chat, I'm available there throughout the weekend (though with a time difference 🕥 :) )

ChaiBapchya commented 7 years ago

Alright...thanks