globaleaks / whistleblowing-software

GlobaLeaks is free, open-source whistleblowing software enabling anyone to easily set up and maintain a secure reporting platform.
https://www.globaleaks.org
Other
1.23k stars 269 forks source link

Some suggestions on database schema #2617

Open saiprashanth173 opened 5 years ago

saiprashanth173 commented 5 years ago

Hi Team,

We are database researchers at the Georgia Institute of Technology. We are developing a tool which automatically detects and provides fixes for issues present in the application database schema, which affect the performance, maintainability, and extensibility of the application. In short, the tool helps the developers to come up with good database schemas for the application.

When we ran the tool through GlobaLeaks, it flagged a few issues. We wanted to validate our results with you and know if they would actually have a positive impact on the application.

One of the issues detected was missing foreign key relationships in some of the tables, you can find the details in the table below.

Model Field Rel Table Column
Comment author_id User id
Context additional_questionnaire_id Questionnaire id
IdentityAccessRequest reply_user_id User id
InternalTip context_id Context id
InternalTip additiona_questionnaire_id Questionnaire id

Some advantages of creating a foreign key relationship:

  1. Referential Integrity - By creating a foreign key the DBMS keeps track of the inserts, updates, and deletes on both the parent and child table, therefore, making sure that the data in the database is always consistent and free from dangling rows.
  2. It enables us to utilize features provided by DBMS like cascade deletes. This reduces the effort of writing custom scripts for the sanity check.
  3. Helps developers to determine how the database is designed and get a better understanding of relationships.
  4. Improved Performance - foreign key constraints are used as hits by the database query optimizers, therefore helps in improving the efficiency of JOIN queries

Although it is believed that foreign key constraints have an impact on the insert performance, from our experiments we realize that this is pretty insignificant opposed to the advantages it offers.

We are happy to share other issues detected by the tool and suggested improvements if you are interested. Your input will be very valuable to us to improve the tool we are developing. Looking forward to hearing from you.

Thanks for your time.

evilaliv3 commented 5 years ago

Thank you for reporting this @saiprashanth173

Actually we make use of foreign keys on almost every table except in some places like the one spot by your software. The reason we are currently not having foreign keys on the attributes you are rporting is that some of those tables require to persist even if the referenced object is deleted. The database will be soon changed splitting the referenced tables in a persistent table (including eventually only the key) and child table includiing only the data to be deleted. Please advice if you have other recomendations!

We would like to inform anyhow that the data reported by your software is correct and valuable.

thank you!

Giovanni

arpitnarechania commented 5 years ago

Hi @evilaliv3 , Sorry for the late response. I am a fellow researcher from the same group as @saiprashanth173 . Thank you for taking time and responding back. Your input has been really helpful.

Another issue we detected is use of CHECK IN constraint while creating the table. We found it in the below tables.

Model Column/s
Field instance
FieldAttr type
User Role, State, Recipient
Message Type

We believe a better design is to create an additional table and creating a foreign key relationship. For instance, in the case of “User” table, “role” column, it’s advisable to create a new table, “Role” which has all the roles and have a foreign key in the “User” table.

Some advantages of having a foreign key relation in place of CHECK constraint check.

  1. Improved performance in case of adding or updating values in the constraint. When we tested the performance difference in the “User” table of GlobaLeaks, adding a new role to the CHECK constraint . took 3x more times than compared to having a Role table with foreign key, when tested on 100,000 rows. Whereas updating the name of the role took 1000x time more time.
  2. Better storage utilization. Using integer foreign key utilizes significantly lesser storage than compared to VARCHAR, therefore leading to efficient data storage.

One downside of having a foreign key is the overhead of performing a Join during retrieval. However, this overhead is pretty insignificant as the tables storing these values are small in size and can easily fit in the memory and the modern DBMS can optimize the join operations.

Please let us know your views on this as well.