Claire3404 / school-database

Database for a primary school
0 stars 0 forks source link

Query / Data Modelling Advice #2

Open cdrowley opened 1 year ago

cdrowley commented 1 year ago

Hello, I'll go through one query at a time in this thread, perhaps asynchronously.

Feel free to shoot back with any questions.

cdrowley commented 1 year ago

Query 1)

I think one issue here may be the attempt to answer these two questions with only one query?

-- Which employees earn more than the average salary and why? 
-- Are all employees on the same band earning the same salary? Why?
 SELECT 
  e.id,
  e.first_name,
  e.last_name,
  e.start_date,
  s.salary,
  j.title,
  su.subject_name
FROM employee AS e

/*  If banding is defined by job title, are these two joins needed? */
- LEFT JOIN subject_lead AS sl
-  ON sl.employee_id = e.id
- LEFT JOIN subject AS su
-  ON su.id = sl.subject_id

/* 
There are duplicates in the salary table (on employee_id), is this expected?

To check if there are duplicates:
    SELECT COUNT(*) AS row_cnt, COUNT(distinct employee_id) uniq_emp_id_cnt FROM salary;

To find duplicates:
    WITH dupes AS (
        SELECT employee_id, COUNT(*) AS cnt
        FROM salary GROUP BY 1 HAVING COUNT(*) > 1
    )
    SELECT * FROM salary
    INNER JOIN dupes USING (employee_id) -- this join is acting as a filter (could also use WHERE IN / WHERE EXISTS instead)
    ORDER BY employee_id DESC;

I'd assume these counts should match unless this is actually more of a `salary_history` table? In which case, look into slowly changing dimensions (SCD).
*/
- LEFT JOIN salary AS s
+ LEFT JOIN (SELECT employee_id, MAX(salary) FROM salary GROUP BY 1) AS s
  ON s.employee_id = e.id
INNER JOIN jobtype as j
  ON j.id = e.jobtype_id
WHERE j.title LIKE "%teacher%"

/* We are grouping here to remove duplicates (but not to calculate any aggregated columns, this suggests you have a grain issue / exploding join that should be addressed instead. */
GROUP BY 5, 1, 2, 3, 4, 6, 7

/* This should be in the WHERE clause instead. */
HAVING s.salary > (SELECT AVG(salary) FROM salary)
ORDER BY salary DESC
;
cdrowley commented 1 year ago

Query 2)

CREATE VIEW vw_pupil_absence
AS
SELECT
  p.first_name AS Pupil_Name,
  p.last_name AS Pupil_Surname,
  c.class_name AS class,
  cr.first_name AS Carer_Name,
  cr.last_name AS Carer_Surname,
  cr.relationship_to_pupil,
  cr.mobile_telephone_number AS mobile,
  cr.work_telephone_number AS work,
  r.register_date
FROM pupil AS p
-  LEFT JOIN class AS c
+ INNER JOIN class AS c -- 1 pupil can only be in 1 class (according to the ERD)
    ON c.id = p.class_id
  INNER JOIN carer AS cr
    ON cr.id = p.id
+ In production we'd likely want to model the attendance & register tables differently (as a full history table or a daily_attendance aka only looking at today) -- are both needed?
  INNER JOIN attendance AS a
    ON a.pupil_id = p.id
  LEFT JOIN register AS r
    ON r.attendance_id = a.id
WHERE a.acode_id = "N"
-  AND r.register_date = "2022-01-02"
+ AND r.register_date = CURRENT_DATE() -- for production use, for testing "2022-01-02" is fine
- GROUP BY 3, 1, 2, 4, 5, 6, 7, 8, 9; -- we shouldn't need to group here?
cdrowley commented 1 year ago

Query 3)

This could be a good candidate for a function or parameterised query.

So we keep the full view or use the query in a function and then inject the WHERE clause values via a friendly user interface that the nurse could interact with.

CREATE VIEW vw_sickness_injury
AS
SELECT 
  p.first_name AS Pupil_Name,
  p.last_name AS Pupil_Surname,
  c.class_name AS class,
  cr.first_name AS Carer_Name,
  cr.last_name AS Carer_Surname,
  cr.relationship_to_pupil,
  cr.mobile_telephone_number AS mobile,
  cr.work_telephone_number AS work,
  ct.consent_type,
  co.consent_given AS consent,
  mc.condition_name AS medical,
  me.medication_name
FROM pupil AS p
  LEFT JOIN class AS c
    ON c.id = p.class_id
  LEFT JOIN carer AS cr
    ON cr.id = p.id
  LEFT JOIN consent AS co
    ON co.pupil_id = p.id
  LEFT JOIN consent_type AS ct
    ON ct.id = co.consent_type_id
  LEFT JOIN medical AS m
    ON m.id = p.id
  LEFT JOIN medical_condition AS mc
    ON mc.id = m.medical_condition_id
  LEFT JOIN medication AS me
    ON me.id = m.medication_id
WHERE 
+ these would become variables in a mySQL function (or injected as parameterised variables via another programming language into the SQL string).
- p.first_name = 'Jessie'
- AND p.last_name = 'Jackson'
- AND ct.consent_type = 'calpol';
cdrowley commented 1 year ago

Query 4)

This is stylistic and using variables is good practice but nesting current_date and having a one-liner feels like a good fit here.

Also changing the input variable name to better understand what kind of input would be expected.

DELIMITER //

- CREATE FUNCTION no_of_years(date1 DATE)
+ CREATE FUNCTION no_of_years(start_date DATE) 
RETURNS INT
DETERMINISTIC
BEGIN
- DECLARE date2 DATE;
- SELECT CURRENT_DATE()INTO date2;
- RETURN YEAR(date2)-YEAR(date1);
+ RETURN YEAR(CURRENT_DATE()) - YEAR(start_date);
END //

DELIMITER ;
Claire3404 commented 1 year ago

****Hi Clive, Thanks a lot for this feedback, it's been really useful. I have a few question for the first query...

Query 1)

/* There are duplicates in the salary table (on employee_id), is this expected?

Within a primary school one person can have a number of job roles e.g. someone can be a class teacher and they can also lead a subject e.g. maths for which they'd earn an extra wage. Another person could just be a class teacher and another could be a teacher and lead multiple subjects for which they'd earn varying different amounts as extra salary depending on the subject. Which subject a teacher leads can also change on a regular basis and with this their salary would change. I guess I added their 'extra salary' for leading a subject in order to show justification as to why one person may be earning more than another. Should I have deigned the table within the database in a better way to prevent these duplicates or, looking back, there probably isn't a need to document the salary for the subject lead role as a separate salary?

I want to continue to improve my data analysis skills and am also interested in data engineering. What would you advise is the next best step to do this? I'm possibly looking for a career change, what would be the next best course to try in your opinion?

Thank you for all your help!