Text Fields in Output SQL from ibd2sql Show as Hexadecimal (e.g., 0x53205265736f6c76696e67204572726f728180)

mahditalebi commented 6 days ago

Hi everyone,

I am currently using the ibd2sql tool to recover data from MySQL InnoDB .ibd files. While most of the data is extracted correctly, I've noticed that certain fields, especially text fields (e.g. TEXT, LONGTEXT), are outputted as hexadecimal values, like:

0x53205265736f6c76696e67204572726f728180

It seems like the tool is converting these fields into hexadecimal format, but these are supposed to be human-readable text fields in my original database. I’m unsure if this is related to the encoding used in my database, or if it’s something specific to ibd2sql.

Questions:

Is there a way to ensure that text fields are properly converted back into readable text during the extraction process?
Could this be caused by an encoding issue (e.g., UTF-8 vs. Latin1)? If so, how can I specify the correct encoding when using ibd2sql?
Are there any tools or scripts that can help decode these hexadecimal fields back to text, or is there a way to handle this more effectively within ibd2sql?

UPDATE: I used -D for debug and get following error in conversion:

[2024-10-20 12:45:49] [DEBUG] BLOB ERROR 'utf-8' codec can't decode byte 0x80 in position 15: invalid start byte

Any help or advice would be greatly appreciated!

ddcw commented 5 days ago

it's not a coding issue (if it weren't for Windows) there are two situations where hexadecimal format will be output,

vector fields
When varchar/lob decode fails (--debug can confirm if you see 'Blob ERROR')

from the debug information, it appears to be the second type. I tried decoding the hexadecimal value (python3: bytes.fromhex(hex_str)) and guessed it might be a bug Can you provide me with the relevant DDL to simulate and see if it can be reproduced?

ddcw commented 5 days ago

You can download the latest version and try it again. Just fixed bug: varchar maxsize 255, only one byte for length

ddcw / ibd2sql

Text Fields in Output SQL from ibd2sql Show as Hexadecimal (e.g., 0x53205265736f6c76696e67204572726f728180) #42