Closed ChaiBapchya closed 7 years ago
@ChaiBapchya thanks for adding your thoughts! We do have a Schema, unfortunately it isn't super well documented yet --
We have a Pull Request open from the arch-overview
branch where the schema is documented:
https://github.com/edgi-govdata-archiving/web-monitoring/blob/arch-overview/README.md#schema
I'm going to work on getting this merged in asap so we can address your qs!
Alright sure. Thanks
The documentation is now live! I'm going to close this issue as we have our preliminary schema, @ChaiBapchya we could work on ideas you have based on the schema in #gsoc
chat, I'm available there throughout the weekend (though with a time difference 🕥 :) )
Alright...thanks
Having gone through the Architecture, I realized the "Database Schema are unknown" is a glaring hole that needs to be sorted. Would like to work on creating the same. Decisions to be taken What type of model to be chosen Eg.Entity relationship model
What DB to be handled Relational, NoSQL, Hadoop / Spark (Big Data)
Basic template (that quickly comes to mind)
Page name Website (to which page belongs( Page id (unique identifier) Previous state Current state List of previous states