有一些小bug - Githubissues

xiaohuazi123 commented 1 year ago

ibd2sdi工具预留在程序里

导入导出表空间，加上锁表和解锁语句和mysqlcheck检查

def ibd2sql(args: dict):
global ibd2sdi_path
if not ibd2sdi_path or not os.path.isfile(ibd2sdi_path):
#判断是不是Windows系统
    is_windows = platform.system() == 'Windows'
    where_cmd = 'where' if is_windows else 'whereis'
    #是Windows就用where ibd2sdi 命令 否则用 whereis ibd2sdi命令， Windows用where ibd2sdi 会出问题
    ibd2sdi_info = subprocess.run([where_cmd, 'ibd2sdi'], stdout=subprocess.PIPE)
    if ibd2sdi_info.returncode != 0:
        raise FileNotFoundError('`ibd2sdi` path is invalid')
    ibd2sdi_path = ibd2sdi_info.stdout.decode('utf-8').strip()
    if not is_windows:
        res_arr = ibd2sdi_path.split(' ')
        if len(res_arr) <= 1:
            raise ValueError('no installed `ibd2sdi` found on system!')   #没有安装ibd2sdi工具
        ibd2sdi_path = res_arr[1]
        if not os.path.isfile(ibd2sdi_path):   #路径不是文件，ibd2sdi工具不存在
            raise FileNotFoundError(f'ibd2sdi({ibd2sdi_path}) not exist, output: {" ".join(res_arr)} ')
db_name = os.path.basename(args['input_ibds'])
ibd_names = [name for name in os.listdir(args['input_ibds']) if name.endswith('.ibd')]  #for循环所有ibd后缀的文件
if not os.path.isdir(args['output']):
    os.mkdir(args['output'])
sdi_out = os.path.join(args['output'], db_name + '_sdi')  #sdi文件输出路径
if not os.path.isdir(sdi_out):
    os.mkdir(sdi_out)
skip_tbls: list = args.get('skip_tbls')
sql_path = os.path.join(args['output'], db_name + '.sql')   #sql文件输出路径
builder = open(sql_path, 'w', encoding='utf-8')
only_tbls: list = args.get('only_tbls')
# 循环将每个表信息写入sql文件
for ibd_name in ibd_names:
    tbl_name = ibd_name.rstrip('.ibd')
    if tbl_name in skip_tbls:
        continue
    if only_tbls and tbl_name not in only_tbls:  #only_tbls不在only_tbls 这句话永远都是假，有bug
        continue
    logger.info(f'handle table: {ibd_name}')
    sdi_path = os.path.join(sdi_out, os.path.splitext(ibd_name)[0] + '.sdi')
    if not os.path.isfile(sdi_path):
        ibd_path = os.path.join(args['input_ibds'], ibd_name)
        sdi_result = subprocess.run([ibd2sdi_path, '--dump-file=' + sdi_path, ibd_path], stdout=subprocess.PIPE)  #把ibdpath里面的ibd文件导出为sdi文件
        if sdi_result.returncode != 0:
            raise ValueError(f'[{ibd_name}] ibd2sdi error:{sdi_result}')
    sdi_data: dict = json.load(open(sdi_path, 'r', encoding='utf-8'))[1]['object']  #读取sdi文件
    if sdi_data.get('dd_object_type') != 'Table':
        logger.info(f'unsupport sdi type: {sdi_data.get("dd_object_type")} for {ibd_name}')
        continue
    #将sdi文件里面所有表结构信息读取出来
    dd_obj: dict = sdi_data.get('dd_object')
    columns: List[Dict] = dd_obj.get('columns')
    indexes: List[Dict] = dd_obj.get('indexes')
    out_tbl_name: str = dd_obj.get('name')
    out_tbl_engine: str = dd_obj.get('engine')
    builder.write(f'DROP TABLE IF EXISTS `{out_tbl_name}`;\n')
    builder.write(f'CREATE TABLE `{out_tbl_name}` (\n')
    is_first = True
    for col in columns:
        if col.get('hidden') == 2:
            continue
        if is_first:
            is_first = False
        else:
            builder.write(',\n')
        col_name = col.get('name')
        col_type = col.get('column_type_utf8')
        builder.write(f'\t`{col_name}` {col_type}')
        if not col.get('is_nullable'):
            builder.write(' NOT NULL')
        if col.get('is_auto_increment'):
            builder.write(' AUTO_INCREMENT')
        if not col.get('has_no_default'):
            if col.get('default_value_null'):
                builder.write(' DEFAULT NULL')
            else:
                def_val = col.get('default_value_utf8')
                if def_val:
                    builder.write(f" DEFAULT '{def_val}'")
        comment = col.get('comment')
        if comment:
            builder.write(f" COMMENT {comment}")
    for idx in indexes:
        if idx.get('hidden'):
            continue
        elts = idx.get('elements')
        if not elts:
            continue
        idx_name = idx.get('name')
        use_elts = [elt for elt in elts if elt['length'] < 4294967295]
        if not use_elts:
            logger.warn(f'invalid index found, no columns linked: {ibd_name}/{idx_name}')
            continue
        builder.write(',\n\t')
        col_names = [columns[elt.get('column_opx')].get('name') for elt in use_elts]
        show_cols = ', '.join(f'`{name}`' for name in col_names)
        idx_type = idx.get('type')
        if idx_type == 1:
            builder.write(f'PRIMARY KEY ({show_cols})')
        elif idx_type in {2, 3}:
            builder.write(f'INDEX `{idx_name}` ({show_cols})')
        elif idx_type == 4:
            builder.write(f'FULLTEXT `{idx_name}` ({show_cols})')
        else:
            raise ValueError(f'unsupport index type: {idx_type} for {ibd_name}/{idx_name}')
    builder.write(f'\n) ENGINE={out_tbl_engine};\n\n')
    builder.flush()   #将所有表信息写入sql文件
builder.close()
logger.info(f'sql generated at: {sql_path}')

  ibd_path = os.path.join(ibd_dir, ibd_name)
        ibd_size = os.path.getsize(ibd_path)
        logger.info(f'importing table: {tbl_name}, size: {ibd_size}')
        #加上锁表和解锁语句，还有mysqlcheck检查
        if not tbl_unlinked:
            cursor.execute(f'ALTER TABLE `{tbl_name}` DISCARD TABLESPACE;')
        shutil.copy(ibd_path, os.path.join(mysql_out, ibd_name))
        try:
            cursor.execute(f'ALTER TABLE `{tbl_name}` IMPORT TABLESPACE;')
        except pymysql.err.InternalError as e:
            if e.args[0] == 1808:
                mismatch_tbls.add(tbl_name)
                mismatch_err = e
            else:
                raise e
    if mismatch_err:
        raise ValueError(f'schema mismatch tbls: {",".join(mismatch_tbls)}', mismatch_err)
    print('import complete!')

anyongjin commented 1 year ago

ibd2sdi工具预留在程序里

你的意思是把ibd2sdi这个程序添加到这个项目里面吗？ibd2sdi是安装mysql 8后附带的工具；我不确定是否有其他dll之类依赖，只copy一个ibd2sdi.exe到项目里面是否可行。有人愿意测试下，确认只要一个ibd2sdi可以顺利执行的话可以加到项目里。

Windows用where ibd2sdi 会出问题

我这里测试是可以输出ibd2sdi的路径的，如果你那边出现错误的话可以贴个错误信息我看看。

only_tbls不在only_tbls 这句话永远都是假，有bug

only_tbls是传入的命令行参数，用于限制只对这些表生成sql语句，如果没有传入的话，only_tbls是空的，这条语句就直接continue

导入导出表空间，加上锁表和解锁语句和mysqlcheck检查

不错的idea，最近在忙别的项目，有人愿意优化下这里的话，欢迎提交pull请求~

最后，感谢反馈，具体的bug最好附带上错误信息，复现步骤~

xiaohuazi123 commented 1 year ago

ibd2sdi工具预留在程序里

你的意思是把ibd2sdi这个程序添加到这个项目里面吗？ibd2sdi是安装mysql 8后附带的工具；我不确定是否有其他dll之类依赖，只copy一个ibd2sdi.exe到项目里面是否可行。有人愿意测试下，确认只要一个ibd2sdi可以顺利执行的话可以加到项目里。

Windows用where ibd2sdi 会出问题

我这里测试是可以输出ibd2sdi的路径的，如果你那边出现错误的话可以贴个错误信息我看看。

only_tbls不在only_tbls 这句话永远都是假，有bug

only_tbls是传入的命令行参数，用于限制只对这些表生成sql语句，如果没有传入的话，only_tbls是空的，这条语句就直接continue

导入导出表空间，加上锁表和解锁语句和mysqlcheck检查

不错的idea，最近在忙别的项目，有人愿意优化下这里的话，欢迎提交pull请求~

最后，感谢反馈，具体的bug最好附带上错误信息，复现步骤~

错误信息

C:\Users\Administrator>where ibd2sdi  

信息: 用提供的模式无法找到文件。

具体报错截图

QQ截图20230524121503

anyongjin commented 1 year ago

需要安装mysql 8，然后把bin目录添加到系统环境变量Path中，where才能生效哈，刚才我删除环境变量再试复现了。我修改下文档说明下

xiaohuazi123 commented 1 year ago

还有argparse也写的不太好

 parser = argparse.ArgumentParser (description='This is descript')
    parser.add_argument( dest='tosql', required=True ,help='generate sql from ibd files')
 parser.add_argument( dest='load_data', required=True ,help='load data from ibd for tables')
     parser.parse_args()

        if args.tosql == 'tosql':
        ibd2sql(config)
    elif args.load_data == 'load_data':
        link_tables_ibd(config)
    else:
        raise ValueError(f'unsupport sub command: {args.cmd}')

add_subparsers是子参数，你这里也没有用到子参数相关功能，而且你还加了一个dest='cmd'，有点莫名其妙

anyongjin commented 1 year ago

我这里之所以用argparse的子命令，是因为这个脚本涉及两块功能：sql生成(tosql) 和数据导入(load_data)，这两个功能是相互独立的，不会一起执行。我希望通过python main.py tosql和python main.py load_data这样简单传入一个单词来表示执行哪个功能。这样的语法刚好就需要子命令。如果不用子命令，可以用python main.py --tosql和python main.py --load_data这种形式，但需要内部做互斥校验，复杂了一点点，而且给人的感觉似乎这两个参数可以同时使用，不够明确。所以我觉得还是子命令来互斥区分不同入口更好些，直接把子命令的存放到cmd里面，判断是哪个子命令，调用不同方法。

xiaohuazi123 commented 1 year ago

明白了，今天又研究了一遍代码，确实这样，互斥的话，判断一下，如果两个参数都有值就报错退出不执行，很多操作系统命令都是这样的，会在help帮助文档里面说明是这个参数跟哪个参数互斥，，不过你这个方法也是可以的

我这里之所以用argparse的子命令，是因为这个脚本涉及两块功能：sql生成(tosql) 和数据导入(load_data)，这两个功能是相互独立的，不会一起执行。我希望通过python main.py tosql和python main.py load_data这样简单传入一个单词来表示执行哪个功能。这样的语法刚好就需要子命令。如果不用子命令，可以用python main.py --tosql和python main.py --load_data这种形式，但需要内部做互斥校验，复杂了一点点，而且给人的感觉似乎这两个参数可以同时使用，不够明确。所以我觉得还是子命令来互斥区分不同入口更好些，直接把子命令的存放到cmd里面，判断是哪个子命令，调用不同方法。

明白了，今天又研究了一遍代码，确实这样，互斥的话，判断一下，如果两个参数都有值就报错退出不执行，很多操作系统命令都是这样的，会在help帮助文档里面说明是这个参数跟哪个参数互斥，，不过你这个方法也是可以的

anyongjin / mysql_ibd

有一些小bug #9