MetricsGrimoire / MailingListStats

Mailing List Stats is a command line based tool used to analyze mboxes
http://metricsgrimoire.github.com/MailingListStats/
GNU General Public License v2.0
38 stars 25 forks source link

Replace mailing_list_url with an integer for database relations #48

Open andygrunwald opened 10 years ago

andygrunwald commented 10 years ago

Hey,

the database scheme can be (easily) optimised. The field mailing_lists.mailing_list_url is a VARCHAR(255) and is used many times as a unique identifier for other tables (e.g. compressed_files, messages, messages_people, etc.). In every relation table the complete string is inserted. This is not good for two reasons: a) Storage: We need a lot more storage to store a (long) string such mailing_list_url b) Joins: Joins are not so performant as integer

How to improve this? We add a new field called mailing_list_id into table mailing_lists and define this as primary key. In every relation table we only insert the integer mailing_list_id. With this we can avoid the two disadvantages above.

What do you think about this suggestion?

gpoo commented 10 years ago

Have you run some profiling to check how different is the performance? If so, may you share the results?