ddcw / ibd2sql

parse mysql ibd file to sql for learn or recovery data
GNU General Public License v3.0
184 stars 54 forks source link

增加按where过滤,写文件支持sql/csv超大文件自动分片,显示处理进度 #40

Open yeshl opened 3 weeks ago

yeshl commented 3 weeks ago

我修改了代码,解析速度大概1w/s,感觉有点慢。。 1.增加写文件sql或csv --format=sql|csv 2.添加对指定字段进行简单过滤,例如where user_id in (20,21,22) --where='3,20,21,22' 3.输出文件按行数分片,写1000w后自动生成新文件,以pageid结尾,可通过--page-start并行解析 --separate=10000000 4.标准输出文件大小,page数量,当前读取行数,已写入行数量,pageid,总page数量

IBD FILE:d:\tmp\pm_1.ibd
  bytes:147456
  pages:9
IBD FILE:d:\tmp\pm_1#p#p20231101.ibd
  bytes:11299454976
  pages:689664
create file pm_deliver#p#p20231101_8.csv, page-start:8
[2024-09-26 11:18:22] writed/readed: 10000/10000,page/total: 227/689664
[2024-09-26 11:18:23] writed/readed: 20000/20000,page/total: 432/689664
[2024-09-26 11:18:24] writed/readed: 30000/30000,page/total: 573/689664

IBD FILE:d:\tmp\pm_1.ibd
  bytes:147456
  pages:9
IBD FILE:d:\tmp\pm_1#p#p20231101.ibd
  bytes:11299454976
  pages:689664
create file pm_deliver#p#p20231101_368533.sql, page-start:368533
[2024-09-26 11:20:21] writed/readed: 10000/10000,page/total: 368734/689664
[2024-09-26 11:20:22] writed/readed: 20000/20000,page/total: 368939/689664
[2024-09-26 11:20:23] writed/readed: 30000/30000,page/total: 369079/689664
ddcw commented 3 weeks ago
  1. 解析速度主要慢在cpu(后面考虑加上并发)
  2. 输出和过滤做得很好. (点赞) 尤其是进度显示, 不然解析的时候只能干等着(虽然也可以查看进程的rchar来判断进度)