BlaiseItUp / kahuna

0 stars 0 forks source link

Data Modeling 1 #4

Open BlaiseItUp opened 3 years ago

BlaiseItUp commented 3 years ago

Data Modeling

Cheatsheet:

First Normal Form (1NF)

A table is supposed to be in first normal form if:

Table 1 - Shirt Info (Primary Key: Shirt_ID)
Shirt_ID Design Size
100 Mickey Mouse XXL,XL,M
101 Flowers M
102 Good Vibes M,S
103 Triangles XL

First Normal Form Tables

Table 1a - Design Info (Primary Key: Shirt_ID)
Shirt_ID Design
100 Mickey Mouse
101 Flowers
102 Good Vibes
103 Triangles
Table 1b - Size Info (Primary Key: Shirt_ID, Size)
Shirt_ID Size
100 XXL
100 XL
100 M
101 M
102 M
102 S
103 XL

Second Normal Form (2NF)

A table is supposed to be in second normal form if:

Table 2 - Customer Info (Primary Key: Cust_ID, Store_ID)
Cust_ID Store_ID Customer_Name Store_Name
568 74896 Liam JPM_1
574 74896 Charles JPM_1
568 25614 Liam JPM_2
235 25614 Cyrus JPM_2

Second Normal Form Tables

Table 2a - Customer Data (Primary Key: Cust_ID)
Cust_ID Customer_Name
568 Liam
574 Charles
145 Marcus
235 Cyrus
Table 2b - Store Data (Primary Key: Store_ID)
Store_ID Store_Name
74896 JPM_1
25614 JPM_2

Third Normal Form (3NF)

A table is supposed to be in third normal form if:

Table 3 - Movie Info (Primary Key: MID)
MID GENRE_ID GENRE Store_Name
1 G1 Comedy 4.2
2 G2 Romance 4.5
3 G3 Action 2.1
4 G4 Horror 3.5

Second Normal Form Tables

Table 3a - Movie Rating (Primary Key: MID)
MID RATING GENRE_ID
1 4.2 G1
2 4.5 G2
3 2.1 G3
4 3.5 G4
Table 3b - Genre Info (Primary Key: Store_ID)
GENRE_ID GENRE
G1 Comedy
G2 Romance
G3 Action
G4 Horror

Exercise

Duplication of data within a relational database should raise all sorts of red flags unless it was done purposefully and logically. For example, if the database for an application that sells merchandise saves the retail price in multiple tables, it is extremely easy for them to get out of synch when one location changes, but the others remain unaltered. Not only does this introduce instability, it wastes storage space as well. Additionally, the application code will become less maintainable as it must now know about multiple places to pull price information. Storing data in multiple places also increases the chance that we will miss something and not apply proper protection. This situation can be resolved by using normalization, a technique employed to reduce redundancy and inconsistency. When we organize data in this way, we follow certain rules, with each rule referred to as a normal form. There are three primary rules, or normalization rules that we can follow. If we follow the first rule, the database is said to be in the first normal form, abbreviated as 1NF. If both the first and second rules are followed simultaneously, it is said to be in the second form of normalization, or 2NF, 3NF follows the same pattern. There are 4NF and 5NF,

Table 1 - CustomersAndSalesRepsAndSales
CustomerId FirstName LastName SalesRepId Product1 Product2
1 Grant Hereford 105 Widget_003 Widget_129
2 Sal Miningent 119 Widget_129 Widget_48

Here, you can see that each record in the CustomersAndSalesRepsAndSales table contains a single unique customer, with each product purchased represented by a single column. As a customer purchases a new product, we must put the product ID into a new column. Since we only have two product columns, this means that a customer will be allowed to purchase at most two products before we have to modify the customer table. Obviously, this is not very scalable, and we will be missing tons of opportunities to sell more of our products to repeat customers. The rule for First Normal Form, or 1NF, states that there are no repeatable fields or groups of fields within a table and is sometimes called the ‘No Repeating Groups’ rule. To have a 1NF database in this example, we will need to create a new record for each purchase, instead of having to create a new column for each purchase. New columns require the table structure to be modified, which is a HUGE deal, while adding new rows is just part of everyday business and easy.

Second Normal Form, or 2NF, mandates that repeated data must be removed. In this case, we need to address the fact that the name of a customer appears in multiple rows. If we need to change the customer’s name, we would need to locate all rows in this table and update them. This is unnecessarily complex, so we need to implement the 2NF rule called ‘Eliminate Redundant Data’. The way to do this is to create a new table representing the relationship between the duplicated data and whatever is unique about each row in the original table. In our case, we need to extract all information that is the same for all purchases made by the same customer. This means the customer name and sales rep ID is moved to a new table. We are assuming that each customer has only one sales rep.

Third normal form, or 3NF, requires that any non-key value in a table that is not dependent on the primary key should be moved to its own table. In our example, the CustomersAndSalesReps table lists the sales rep ID for each customer, but the value in this column will be duplicated if a sales rep can handle more than one customer. This is true in our pretend company, so the ‘Eliminate Non-Key-Dependent Duplicate Data’ rule, or 3NF, requires us to create yet another new table representing sales reps.

ssavely commented 3 years ago

My 2NF and 3NF led me to have a TCustID table which honeslty didn't seem very effective because it only contained custID, which was then used as a foreign key in the other tables. I feel like I should've keep CustID, FName, and LName in the same table DataModeling HW.xlsx