[db] basic data struct - Githubissues

Alice52 commented 3 years ago

type

int type: unsigned

type	Byte	value	ZEROFILL default	comment 、
BIT	1/8	0-1	1	0-1 bin-data
tinyint	1	-128- 127	4	TINYINT(1) == Boolean
SMALLINT	2	-2^16- 2^16-1	6
MEDIUMINT	3	-2^24- 2^24-1	9
INT	4	-2^32- 2^32-1	1
BIGINT	8	-2^128- 2^128-1	20
DATETIME	8	max 是 9999		时间日期
TIMESTAMP	4	max 是 2038		时间戳
DATE	3
TIME	3
YEAR	1
decimal(z, x)	--			z 表示整数位, x 表示小数位

t1 tinyint(1) unsigned zerofill NOT NULL DEFAULT '0': 0 到 255

ZEROFILL: 显示宽度, 不够时前面补 0

char & varchar & nvarchar: 查询时会忽略最后的空格

char(n): 效率
- char,最多 255 个字符, 与编码无关: n 表示字符数
- 不区分大小写, 如果字符集是 **-bin 则区分大小写
- 查询时会忽略后面的空格, 存储时会存储前面空格, 后面不满的用空格字符补足
[n]varchar(n): 空间
- n 的含义: 最大长度为 n 个字符, 最大是 65535
- 最大长度 = 字段长度 + [长度记录: (1 或 2) B] + [null 标志位：1B]
- 汉字: latin1(2 字节 2 字符) || gbk(2 字节 1 字符) || utf8(3 字节 1 字符) || utf8mb4(4 字节 1 字符): 汉字占字符数, 前一个是一个字符占存储空间字节数
- 查询时会忽略后面的空格, 存储时会存所有的空格
- varchar 需要额外 1 或 2 个字节记录字符串长度，更新时也容易产生碎片
varchar & ~~nvarchar~~

Warning varchar 与 nvarchar 的区别只暗示不同的字符集 & nvarchar=varchar+utf8 能不能存储特殊字符在乎字段的字符集 mysql 底层只有 varchar, 真实存储的就是 unicode 编码
- 表级别指定默认字符集为 utf8mb4 且字段使用默认字符集时
  1. varchar 会使用 utf8mb4 作为字符集
  2. nvarchar 会使用 utf8 作为字符集: 原因是 nvarchar/nchar/national 的默认实现是 utf8(utf8mb3)
  3. 所有涉及 nvarchar 的操作, 在不指定字符集的情况下都会默认使用 utf8: alter 等修改操作
- mysql 中的 nvarchar/nchar 只是为了暗示 indicate 字段字符集, 并不是底层存储的存储的区别
- 创建不了 nvarchar + nvarchar=varchar+utf8
- 转换表的字符编码: ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
- 区别: mysql 中不是
  1. varchar 是存储非 unicode 编码, 默认使用表的字符集
  2. nvarchar 是存储 Unicode 编码, 可以使用数据库指定的字符集,
  3. 但是现在操作系统或者平台都是使用 unicode, 所以使用 nvarchar 可以避免每次存入读取数据库时的转换操作
- ~~varchar 不能存储 Unicode 字符集数据(使用的是 8-bit 的 codepage), 节约空间~~
选择: char vs varchar
- 如果字符串列最大长度比平均长度大很多, 或者列的更新很少, 选择 varchar 较合适{不要使用nvarchar}
- 如果要存很短的字符串, 或者字符串值长度都相同, 比如 MD5 值, 或者列数据经常变更, 选择使用 char 类型
json
- 字段可以说是 json, 但是默认无法创建索引
- 但是可以使用 generated column 字段存储 json 中的某个值: 可以在此基础上创建索引

date-time

datetime & date & timestamp
- 函数: 都可以使用 DEFAULT CURRENT_TIMESTAMP | ON UPDATE CURRENT_TIMESTAMP
- 时区问题:
  1. datetime 没有时区概念: 保存的是当前会话的设置的时区的时间[如果换时区则显示就不对了]
  2. timestamp 时区相关: 会把客户端的时间转换为 utc 进行存储, 查询时将其转换为本地时间;
- 本质:
  1. datetime: 没有时区 + 字符串 +存入后无论哪个时区查询都得到相同的结果
  2. timestamp: 有时区 + 时间戳 + 存入的是 utc 时间戳(查询后会转换到查询的时区时间)
- 时间范围:
  1. date: 只表示日期, 999
  2. timestamp: 最大值是 2038 年{可以考虑使用时间戳(4字节)-unix_timestamp}, mysql 可以使用8位存储同意解决这个问题
  3. datetime: 9999
- 空间
  1. datetime: 8 字节
  2. timestamp: 4 字节
- 时间精度: 默认是 s, 最大可以毫秒上 6 位
结论: 优先使用 Timestamp
mysql and jdbc: serverTimezone 超重要
- serverTimezone这个要显示设置为数据库所在时区: 默认可能得到 cst 混乱时区
- 使用mysql的timestamp类型时, 一定要保证jdbc url中的serverTimezone与数据库中的时区配置是一致的
- 处理过程
  1. mysql驱动在创建链接时, 会使用 serverTimezone 设置链接的时区: 没有设置时会获取mysql 默认的时区
  2. 调用jdbc的setTimestamp()方法时, 会将应用的时间转换为serverTimezone 时区的字符串(传输)
  3. 执行sql 时, 发送转换后的字符串
  4. mysql server 得到字符串会认为是自己时区的字符串, 之后在转换为 timestamp 进行存储
  5. 查询过程是反过来的: 逻辑不变
best practice
- 尽量使用时间戳: db-timestamp + code-Date/LocalDateTime
- 如果要是有时间字符串, 一定要使用带时区的时间串, 如ISO8601(2020-02-23T08:00:00+08:00)
- 代码数据定义要与数据库定义一致
- jdbc的serverTimezone参数要配置正确:
  - 不配置时会默认使用数据库时区: 此时一定要将数据库时区指定为清晰的时区(如：+08:00), 切勿使用CST
- 如果数据库时区修改后
  - jdbc的serverTimezone也要跟着修改, 并重启应用
  - 就算没有配置serverTimezone, 也需要重启: 因为mysql驱动初始化连接时, 会将当前数据库时区缓存到一个java变量中, 不重启应用它不会变
int 的不好点
- 无法使用 CURRENT_TIMESTAMP 函数
- Int 的时间戳将时区问题抛给了应用端: 不友好, 自己转换时区问题, 比较容易忘记
- Int 识别起来有难度, 需要转换才能看懂
- Int 4字节只能标识到2038年, 可以考虑使用 long 替代
  apply
IPv4 地址使用 UNSIGNED INT
- 节约空间
- 可范围查找
- 不便于阅读: inet_aton/inet_ntoa
ipv6 使用 varbinary
- inet6_aton/inet6_ntoa

Alice52 commented 2 years ago

ascii unicode & utf-8

ascii 是小表, 1字节, 只包含数字英文符号等很少的字符(127+xx=256)个
unicode 是统一所有语言的一套编码
utf-8 是基于unicode编码的一种节约字节的编码

utf-8

英文字母占 1 字节
英文符号占 1 字节
数字及符号占 1 字节
西文á占 2 字节
普通汉字占 3 字节
中文符号占 3 字节
emoji 占 4 字节

Alice52 commented 1 year ago

pgsql data types

Name	Aliases	Description
bit [ (n) ]		fixed-length bit string
boolean	bool	logical Boolean (true/false)
bytea		binary data (“byte array”)
smallint	int2	signed two-byte integer
integer	int, int4	signed four-byte integer
bigint	int8	signed eight-byte integer
real	float4	single precision floating-point number (4 bytes)
double precision	float8	double precision floating-point number (8 bytes)
numeric [ (p, s) ]	decimal[(p, s)]	exact numeric of selectable precision
character [ (n) ]	char [ (n) ]	fixed-length character string
character varying [ (n) ]	varchar [ (n) ]	variable-length character string
timestamp [ (p) ] [ without time zone ]		date and time (no time zone)
date		calendar date (year, month, day)
json		textual JSON data
jsonb		binary JSON data, decomposed
serial	serial4	autoincrementing four-byte integer
uuid		universally unique identifier
cidr		IPv4 or IPv6 network address
---------------------	-------------	---------------------
bigserial	serial8	autoincrementing eight-byte integer
bit varying [ (n) ]	varbit [ (n) ]	variable-length bit string
box		rectangular box on a plane
circle		circle on a plane
inet		IPv4 or IPv6 host address
interval [ fields ] [ (p) ]		time span
line		infinite line on a plane
lseg		line segment on a plane
macaddr		MAC (Media Access Control) address
macaddr8		MAC (Media Access Control) address (EUI-64 format)
money		currency amount
path		geometric path on a plane
pg_lsn		PostgreSQL Log Sequence Number
pg_snapshot		user-level transaction ID snapshot
point		geometric point on a plane
polygon		closed geometric path on a plane
smallserial	serial2	autoincrementing two-byte integer
text		variable-length character string
time [ (p) ] [ without time zone ]		time of day (no time zone)
time [ (p) ] with time zone	timetz	time of day, including time zone
timestamp [ (p) ] with time zone	timestamptz	date and time, including time zone
tsquery		text search query
tsvector		text search document
txid_snapshot		user-level transaction ID snapshot (deprecated; see pg_snapshot)
xml		XML data

Alice52 commented 1 year ago

pgsql

timestamp

-- 将中国时区的时间 '2022-02-12 04:16:58' 转换为时间戳
select EXTRACT(epoch FROM CAST('2022-02-12 04:16:58' AS TIMESTAMP)  at time zone 'Asia/Shanghai');

Alice52 / database

[db] basic data struct #46

type

char & varchar & nvarchar: 查询时会忽略最后的空格

date-time

apply

ascii unicode & utf-8

utf-8

pgsql data types

pgsql