Up to now, we've been working with a single table, but entity data in the real world is often broken down into pieces and stored across multiple orthogonal tables using a process known as normalization.
In order to answer questions about an entity that has data spanning multiple tables in a normalized database, we need to learn how to write a query that can combine all that data and pull out exactly the information we need.
前面我们都是在一个表格里操作,但是真实世界的数据往往乱七八糟地组合在多个相关的表格中。这时候我们要查询就必须想办法把需要的信息从不同表格中的数据提取出来并组合到一起。
Multi-table queries with JOINs
Tables that share information about a single entity need to have a primary key that identifies that entity uniquely across the database. One common primary key type is an auto-incrementing integer (because they are space efficient), but it can also be a string, hashed value, so long as it is unique.
Using the JOIN clause in a query, we can combine row data across two separate tables using this unique key. The first of the joins that we will introduce is the INNER JOIN.
不同表格含有关于同一观测对象的信息需要通过唯一的主键相关联。最常见的主键类型就是递增的整数,这个做法省空间。但是主键也有可能是字符串,哈希值,只要是个唯一性的东西就行。
JOIN 从句可以通过唯一的主键把不同的表格整合到一起。我们首先要学习的是 INNER JOIN。语法:
SELECT column, another_table_column, …
FROM mytable
INNER JOIN another_table
ON mytable.id = another_table.id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;
The INNER JOIN is a process that matches rows from the first table and the second table which have the same key (as defined by the ON constraint) to create a result row with the combined columns from both tables. After the tables are joined, the other clauses we learned previously are then applied.
INNER JOIN 通过在 一张表格和 ON 定义的第二张表格之间匹配相同的键值来合并两个表格得到结果。我们之前学习的那些从句都是在表格合并之后执行的。
You might see queries where the INNER JOIN is written simply as a JOIN. These two are equivalent, but we will continue to refer to these joins as inner-joins because they make the query easier to read once you start using other types of joins, which will be introduced in the following lesson.
INNER JOIN 可以简写成 JOIN,但是为了代码的可读性,大家还是该怎样怎样吧,多打一个单词而已。
We've added a new table to the Pixar database so that you can try practicing some joins. The BoxOffice table stores information about the ratings and sales of each particular Pixar movie, and the Movie_id column in that table corresponds with the Id column in the Movies table 1-to-1. Try and solve the tasks below using the INNER JOIN introduced above.
这次练习题有两张表。BoxOffice 存储每部电影的评分和票房情况,通过 Movie_id 和另一张表格 Movies 里的 Id 一一对应。两张表格大概长这样:
练习题:
Find the domestic and international sales for each movie
找出每部电影的国内外票房情况,
SELECT Title, Domestic_sales, International_sales FROM movies m INNER JOIN Boxoffice b
ON m.Id=b.Movie_id;
偷懒用了缩写 : )
Show the sales numbers for each movie that did better internationally rather than domestically
国际票房好过国内的,就是加一个限制条件,WHERE 一下
SELECT Title, Domestic_sales, International_sales FROM movies m INNER JOIN Boxoffice b
ON m.Id=b.Movie_id
WHERE b.International_sales > b.Domestic_sales;
List all the movies by their ratings in descending order
所有电影的评分降序排列
SELECT Title, Rating FROM movies m INNER JOIN Boxoffice b
ON m.Id=b.Movie_id
ORDER BY Rating DESC;
INNER JOIN 还算简单的吧,下一个
SQL Lesson 7: OUTER JOINs
Depending on how you want to analyze the data, the INNER JOIN we used last lesson might not be sufficient because the resulting table only contains data that belongs in both of the tables.
根据查询任务不同,我们会经常发现 INNER JOIN 不够用的情况,因为它只能取两个表之间共有的行。
If the two tables have asymmetric data, which can easily happen when data is entered in different stages, then we would have to use a LEFT JOIN, RIGHT JOIN or FULL JOIN instead to ensure that the data you need is not left out of the results.
在现实实际乱七八糟的数据,各表格之间数据往往都是不对称的,这时候就得用上 LEFT JOIN、RIGHT JOIN、FULL JOIN 这些了。语法和前面的很类似:
SELECT column, another_column, …
FROM mytable
INNER/LEFT/RIGHT/FULL JOIN another_table
ON mytable.id = another_table.matching_id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;
Like the INNER JOIN these three new joins have to specify which column to join the data on.
When joining table A to table B, a LEFT JOIN simply includes rows from A regardless of whether a matching row is found in B. The RIGHT JOIN is the same, but reversed, keeping rows in B regardless of whether a match is found in A. Finally, a FULL JOIN simply means that rows from both tables are kept, regardless of whether a matching row exists in the other table.
和 INNER JOIN 一样,这些不同的 JOIN 都需要指定我们通过哪一列来组合不同的表格数据。
举个例子,我们想把表格 A、B 组合起来, LEFT JOIN 结果会包括所有 A 里有的行,不论 B 里是否存在。RIGHT JOIN 类似,是反过来的,保留 B 里所有的行,不管 A 里有没有。FULL JOIN 就是包括两个表里所有的行,不管每一行是否存在匹配行。
When using any of these new joins, you will likely have to write additional logic to deal with NULLs in the result and constraints (more on this in the next lesson).
用到这些 JOIN 的时候,我们很有可能需要增加逻辑判断来处理结果中的 NULLs 值或者限制条件。下一节课会讲到。
You might see queries written these joins written as LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN, but the OUTER keyword is really kept for SQL-92 compatibility and these queries are simply equivalent to LEFT JOIN, RIGHT JOIN, and FULL JOIN respectively.
LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN 这些 JOIN 里面的 OUTER 都是为了保留兼容性的。
In this exercise, you are going to be working with a new table which stores fictional data about Employees in the film studio and their assigned office Buildings. Some of the buildings are new, so they don't have any employees in them yet, but we need to find some information about them regardless.
Since our browser SQL database is somewhat limited, only the LEFT JOIN is supported in the exercise below.
这次的练习题是两张新的表格 Employees 和 Buildings:
Find the list of all buildings that have employees
找到所有有雇员入住的楼栋,出现在 Employees 这个表格的楼栋那就肯定是入住的,所以 LEFT JOIN 前面是 Employees 就好了。然后用 DISTINCT 去重(想了半天,差点把这个忘了)。
SELECT DISTINCT Building_name FROM Employees e LEFT JOIN Buildings b ON
e.Building=b.Building_name;
Find the list of all buildings and their capacity
列出所有的楼栋及其容量,这不就是 Buildings 这个表格么?认真的么?
SELECT * FROM Buildings; 哈士奇狗头??
List all buildings and the distinct employee roles in each building (including empty buildings)
列出所有楼栋里唯一的雇员身份信息,空的楼栋也要,那肯定 LEFT JOIN 前面是 Buildings 就好了。
SELECT DISTINCT Building_name, Role FROM Buildings b LEFT JOIN Employees e ON b.Building_name=e.Building;
SQL Lesson 8: A short note on NULLs
NULL 值简单介绍
As promised in the last lesson, we are going to quickly talk about NULL values in an SQL database. It's always good to reduce the possibility of NULL values in databases because they require special attention when constructing queries, constraints (certain functions behave differently with null values) and when processing the results.
最好的做法当然是在查询和处理结果时避免数据中出现 NULL,因为它们往往需要我们特别注意。
An alternative to NULL values in your database is to have data-type appropriate default values, like 0 for numerical data, empty strings for text data, etc. But if your database needs to store incomplete data, then NULL values can be appropriate if the default values will skew later analysis (for example, when taking averages of numerical data).
对于需要出现 NULL 的地方,一种办法是使用合适类型的默认值,比如数据就用 0,文本就用空字符串等。但是如果本身就是要存储非完整的数据的话,有时候默认值会干扰后续分析而 NULL 反而更合适(比如数据取均值)。(这一段不是很理解.....)
Sometimes, it's also not possible to avoid NULL values, as we saw in the last lesson when outer-joining two tables with asymmetric data. In these cases, you can test a column for NULL values in a WHERE clause by using either the IS NULL or IS NOT NULL constraint.
有时候,NULL 是无法避免的,比如我们上节课看到的把两个不对称的数据表外连接起来的时候。这时候,对于某一列数据中的 NULL 值我们可以用 WHERE 从句配合 IS NULL 或 IS NOT NULL 来做判断。语法:
SELECT column, another_column, …
FROM mytable
WHERE column IS/IS NOT NULL
AND/OR another_condition
AND/OR …;
This exercise will be a sort of review of the last few lessons. We're using the same Employees and Buildings table from the last lesson, but we've hired a few more people, who haven't yet been assigned a building.
这次的练习题是对前面的课程的复习。用到的数据还是 Employees 和 Buildings 这两张表:
但是可以看到招聘的雇员多了一些,而且一些人还没有分配到入住楼栋。
练习题:
Find the name and role of all employees who have not been assigned to a building
找出没有分配到住所的员工及其岗位信息(要分房了??),这个不是就用 Employees 一张表找到没有住房信息的人就行了么: SELECT Role, Name, Building FROM Employees WHERE Building IS NULL;
Find the names of the buildings that hold no employees
找到没有雇员入住的楼栋,这就新房啊,这个问题我想了一会儿,最后发现其实 LEFT JOIN 然后找没雇员的楼栋就行:
SELECT Building_name, Name FROM Buildings b LEFT JOIN Employees e ON
b.Building_name = e.Building
WHERE Name IS NULL;
昨天看了大火的《工作细胞》,挺有趣的。血小板太可爱了!
然后睡觉前看了一下 Todo list:
任重而道远啊,后面的课程还多着呢。继续继续——
SQL Lesson 6: Multi-table queries with JOINs
第六课,JOINs 多表格查询
Up to now, we've been working with a single table, but entity data in the real world is often broken down into pieces and stored across multiple orthogonal tables using a process known as normalization.
In order to answer questions about an entity that has data spanning multiple tables in a normalized database, we need to learn how to write a query that can combine all that data and pull out exactly the information we need. 前面我们都是在一个表格里操作,但是真实世界的数据往往乱七八糟地组合在多个相关的表格中。这时候我们要查询就必须想办法把需要的信息从不同表格中的数据提取出来并组合到一起。
Multi-table queries with JOINs
Tables that share information about a single entity need to have a primary key that identifies that entity uniquely across the database. One common primary key type is an auto-incrementing integer (because they are space efficient), but it can also be a string, hashed value, so long as it is unique.
Using the JOIN clause in a query, we can combine row data across two separate tables using this unique key. The first of the joins that we will introduce is the INNER JOIN. 不同表格含有关于同一观测对象的信息需要通过唯一的主键相关联。最常见的主键类型就是递增的整数,这个做法省空间。但是主键也有可能是字符串,哈希值,只要是个唯一性的东西就行。 JOIN 从句可以通过唯一的主键把不同的表格整合到一起。我们首先要学习的是 INNER JOIN。语法:
The INNER JOIN is a process that matches rows from the first table and the second table which have the same key (as defined by the ON constraint) to create a result row with the combined columns from both tables. After the tables are joined, the other clauses we learned previously are then applied. INNER JOIN 通过在 一张表格和 ON 定义的第二张表格之间匹配相同的键值来合并两个表格得到结果。我们之前学习的那些从句都是在表格合并之后执行的。
You might see queries where the INNER JOIN is written simply as a JOIN. These two are equivalent, but we will continue to refer to these joins as inner-joins because they make the query easier to read once you start using other types of joins, which will be introduced in the following lesson. INNER JOIN 可以简写成 JOIN,但是为了代码的可读性,大家还是该怎样怎样吧,多打一个单词而已。
We've added a new table to the Pixar database so that you can try practicing some joins. The BoxOffice table stores information about the ratings and sales of each particular Pixar movie, and the Movie_id column in that table corresponds with the Id column in the Movies table 1-to-1. Try and solve the tasks below using the INNER JOIN introduced above. 这次练习题有两张表。BoxOffice 存储每部电影的评分和票房情况,通过 Movie_id 和另一张表格 Movies 里的 Id 一一对应。两张表格大概长这样:
练习题:
Find the domestic and international sales for each movie 找出每部电影的国内外票房情况,
偷懒用了缩写 : )
Show the sales numbers for each movie that did better internationally rather than domestically 国际票房好过国内的,就是加一个限制条件,WHERE 一下
List all the movies by their ratings in descending order 所有电影的评分降序排列
INNER JOIN 还算简单的吧,下一个
SQL Lesson 7: OUTER JOINs
Depending on how you want to analyze the data, the INNER JOIN we used last lesson might not be sufficient because the resulting table only contains data that belongs in both of the tables. 根据查询任务不同,我们会经常发现 INNER JOIN 不够用的情况,因为它只能取两个表之间共有的行。
If the two tables have asymmetric data, which can easily happen when data is entered in different stages, then we would have to use a LEFT JOIN, RIGHT JOIN or FULL JOIN instead to ensure that the data you need is not left out of the results. 在现实实际乱七八糟的数据,各表格之间数据往往都是不对称的,这时候就得用上 LEFT JOIN、RIGHT JOIN、FULL JOIN 这些了。语法和前面的很类似:
Like the INNER JOIN these three new joins have to specify which column to join the data on. When joining table A to table B, a LEFT JOIN simply includes rows from A regardless of whether a matching row is found in B. The RIGHT JOIN is the same, but reversed, keeping rows in B regardless of whether a match is found in A. Finally, a FULL JOIN simply means that rows from both tables are kept, regardless of whether a matching row exists in the other table. 和 INNER JOIN 一样,这些不同的 JOIN 都需要指定我们通过哪一列来组合不同的表格数据。 举个例子,我们想把表格 A、B 组合起来, LEFT JOIN 结果会包括所有 A 里有的行,不论 B 里是否存在。RIGHT JOIN 类似,是反过来的,保留 B 里所有的行,不管 A 里有没有。FULL JOIN 就是包括两个表里所有的行,不管每一行是否存在匹配行。
When using any of these new joins, you will likely have to write additional logic to deal with NULLs in the result and constraints (more on this in the next lesson). 用到这些 JOIN 的时候,我们很有可能需要增加逻辑判断来处理结果中的 NULLs 值或者限制条件。下一节课会讲到。
You might see queries written these joins written as LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN, but the OUTER keyword is really kept for SQL-92 compatibility and these queries are simply equivalent to LEFT JOIN, RIGHT JOIN, and FULL JOIN respectively. LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN 这些 JOIN 里面的 OUTER 都是为了保留兼容性的。
In this exercise, you are going to be working with a new table which stores fictional data about Employees in the film studio and their assigned office Buildings. Some of the buildings are new, so they don't have any employees in them yet, but we need to find some information about them regardless. Since our browser SQL database is somewhat limited, only the LEFT JOIN is supported in the exercise below. 这次的练习题是两张新的表格 Employees 和 Buildings:
前者存储一个电影工作室的雇员信息,后者是雇员的工作地点信息。有的地点是新建的因此没有雇员入住。练习题只支持LEFT JOIN 。
练习题:
Find the list of all buildings that have employees 找到所有有雇员入住的楼栋,出现在 Employees 这个表格的楼栋那就肯定是入住的,所以 LEFT JOIN 前面是 Employees 就好了。然后用 DISTINCT 去重(想了半天,差点把这个忘了)。
Find the list of all buildings and their capacity 列出所有的楼栋及其容量,这不就是 Buildings 这个表格么?认真的么?
SELECT * FROM Buildings;
哈士奇狗头??List all buildings and the distinct employee roles in each building (including empty buildings) 列出所有楼栋里唯一的雇员身份信息,空的楼栋也要,那肯定 LEFT JOIN 前面是 Buildings 就好了。
SQL Lesson 8: A short note on NULLs
NULL 值简单介绍
As promised in the last lesson, we are going to quickly talk about NULL values in an SQL database. It's always good to reduce the possibility of NULL values in databases because they require special attention when constructing queries, constraints (certain functions behave differently with null values) and when processing the results. 最好的做法当然是在查询和处理结果时避免数据中出现 NULL,因为它们往往需要我们特别注意。
An alternative to NULL values in your database is to have data-type appropriate default values, like 0 for numerical data, empty strings for text data, etc. But if your database needs to store incomplete data, then NULL values can be appropriate if the default values will skew later analysis (for example, when taking averages of numerical data). 对于需要出现 NULL 的地方,一种办法是使用合适类型的默认值,比如数据就用 0,文本就用空字符串等。但是如果本身就是要存储非完整的数据的话,有时候默认值会干扰后续分析而 NULL 反而更合适(比如数据取均值)。(这一段不是很理解.....)
Sometimes, it's also not possible to avoid NULL values, as we saw in the last lesson when outer-joining two tables with asymmetric data. In these cases, you can test a column for NULL values in a WHERE clause by using either the IS NULL or IS NOT NULL constraint. 有时候,NULL 是无法避免的,比如我们上节课看到的把两个不对称的数据表外连接起来的时候。这时候,对于某一列数据中的 NULL 值我们可以用 WHERE 从句配合 IS NULL 或 IS NOT NULL 来做判断。语法:
This exercise will be a sort of review of the last few lessons. We're using the same Employees and Buildings table from the last lesson, but we've hired a few more people, who haven't yet been assigned a building. 这次的练习题是对前面的课程的复习。用到的数据还是 Employees 和 Buildings 这两张表:
但是可以看到招聘的雇员多了一些,而且一些人还没有分配到入住楼栋。
练习题:
找出没有分配到住所的员工及其岗位信息(要分房了??),这个不是就用 Employees 一张表找到没有住房信息的人就行了么:
SELECT Role, Name, Building FROM Employees WHERE Building IS NULL;
发现第 8 课开始是新的内容,那到这里第二篇笔记结束吧。