ddcw / ibd2sql

parse mysql ibd file to sql for learn or recovery data
GNU General Public License v3.0
193 stars 54 forks source link

Text Fields in Output SQL from ibd2sql Show as Hexadecimal (e.g., 0x53205265736f6c76696e67204572726f728180) #42

Open mahditalebi opened 6 days ago

mahditalebi commented 6 days ago

Hi everyone,

I am currently using the ibd2sql tool to recover data from MySQL InnoDB .ibd files. While most of the data is extracted correctly, I've noticed that certain fields, especially text fields (e.g. TEXT, LONGTEXT), are outputted as hexadecimal values, like:

0x53205265736f6c76696e67204572726f728180

It seems like the tool is converting these fields into hexadecimal format, but these are supposed to be human-readable text fields in my original database. I’m unsure if this is related to the encoding used in my database, or if it’s something specific to ibd2sql.

Questions:

UPDATE: I used -D for debug and get following error in conversion:

[2024-10-20 12:45:49] [DEBUG] BLOB ERROR 'utf-8' codec can't decode byte 0x80 in position 15: invalid start byte

Any help or advice would be greatly appreciated!

ddcw commented 5 days ago

it's not a coding issue (if it weren't for Windows) there are two situations where hexadecimal format will be output,

  1. vector fields
  2. When varchar/lob decode fails (--debug can confirm if you see 'Blob ERROR')

from the debug information, it appears to be the second type. I tried decoding the hexadecimal value (python3: bytes.fromhex(hex_str)) and guessed it might be a bug Can you provide me with the relevant DDL to simulate and see if it can be reproduced?

ddcw commented 5 days ago

You can download the latest version and try it again. Just fixed bug: varchar maxsize 255, only one byte for length