euzun / jpeg-carver-csharp

A C# toolbox to recover orphaned fragment (without any header information) data.
9 stars 0 forks source link

I need some help #1

Closed HanalogInstruments closed 2 years ago

HanalogInstruments commented 2 years ago

Hi,

  1. How can i use the gui of JpegRecovery.exe?
  2. I have some black block mcu of jpeg and how to remove them Screenshot (2)
euzun commented 2 years ago

Hi,

  1. To run the JpegRecovery.exe, you need unfortunately a Windows OS and a dependency lib (please check Readme).
  2. Black MCU blocks at the beginning and end of the recovered image is normal since we recover an orphaned file. That means the orphaned file may not have the image pixels at the beginning and end of the first and last image rows. But, our tool makes an error-tolerant recovery and try to position each pixel at the correct position. Hence, we need to fill zero pixels for the non-existed positions. Since we aim to recover any valid pixels, we don't remove partial rows. You can manually remove them by using an image editor.
HanalogInstruments commented 2 years ago

Hi,

  1. Could you explain how this decode method works?

  2. I see some jpeg have wrong color sampling because the encrypted jpeg from SOF -> SOI and few bytes encrypted

  3. I really want to use this method to build a tool repair jpeg for missing exif case only raw data

Best regards, Hanalog Inc.

HanalogInstruments commented 2 years ago

I have some jpeg encrypted 153605 bytes from the header:

https://drive.google.com/file/d/1hW6DfS95D_y4KMlu-zfpRTAX2xW6WWeW/view

I use the CLI version from your source code and i run it via python:

https://github.com/euzun/jpeg-carver-csharp

with open('_NHL0590.jpg', 'rb') as in_file: with open('_NHL0590', 'wb') as out_file: out_file.write(in_file.read()[153605:]) import os os.system('cmd /k "JpegRecovery.exe _NHL0590"')

image image image

HanalogInstruments commented 2 years ago

The picture after repair have high dqt tables, how can i reduce the pixel and color of jpeg?

I think how to get Chroma subsampling to 4:2:2 of exported jpeg after JpegRecovery.exe _NHL0590

euzun commented 2 years ago

HI,

Before the pipeline note that it works for the couple of standard Huffman codding tables. Please check our papers for the details (and cite them if you publish your work.)

Here is a very abstract overview of the decoder.

  1. It decodes the given encoded file until reaching a synchronization points. There are two rules to find a sync point: 1) reaching an invalid Huffman code or invalid marker, and 2) overflowing zero pixels while making zigzag decoding an 8x8 block. You can find out detailed explanation in the paper.
  2. After decoding 8x8 blocks, the decoder creates a pseudo header by analyzing these blocks. It first tries to find Chroma subsampling, and then image width by checking the pixel correlations within the block borders. It then finds the start of an image to correctly position the first validly recovered MCU block in the first row. It finally tries to normalize color of the image. The last step may not work properly since jpegs are encoded deferentially by the reference of the first pixel. Hence, when we lost the first pixel, the following pixels' colors are depending on the first partially recovered one. You can try to manually fix it.
  3. I may forgot some of the steps, please check the papers.

Note, it seems the chroma subsampling of your image seems correct. For selecting dqt tables, I analyzed around 1 million jpeg images and try to use the most common dqt table for each chroma scheme. So, it may not give the best result for your image, but our number-one purpose was to recover a raw and headerless encoded jpeg data into a viewable content. The fine-tuning and enhancement may be applied after recovering these pixels.

Please let me know for further questions.

Thanks, Erkam

YT-Advanced commented 2 years ago

But how about if you support 4:4:4 Chroma subsampling for all picture that can support it

HanalogInstruments commented 2 years ago

But how about if you support 4:4:4 Chroma subsampling for all picture that can support it

Do you want work to this?, i want to work with you to improve @euzun project

euzun commented 2 years ago

But how about if you support 4:4:4 Chroma subsampling for all picture that can support it

I could not understand the question. The current codebase supports all 4 chroma subsamplings and detect the correct one with over 95% accuracy (please check the papers for more details).

HanalogInstruments commented 2 years ago

But how about if you support 4:4:4 Chroma subsampling for all picture that can support it

I could not understand the question. The current codebase supports all 4 chroma subsamplings and detect the correct one with over 95% accuracy (please check the papers for more details).

I'll write some paper of color fix of this issue i solved, thank @euzun , i tried to write a script via python to autocrop and white balance fixed

Before:

image

After:

image

euzun commented 2 years ago

Cool, glad to hear that. Good luck with your paper and please don't forget to cite our both papers:) One issue about color fixing: just be sure that you try your script different samples. It may work properly in one image while making worse in another. I'd try to measure signal to noise ratio after applying the color fixing code on maybe hundreds of images.

HanalogInstruments commented 2 years ago

Cool, glad to hear that. Good luck with your paper and please don't forget to cite our both papers:)

One issue about color fixing: just be sure that you try your script different samples. It may work properly in one image while making worse in another. I'd try to measure signal to noise ratio after applying the color fixing code on maybe hundreds of images.

How i can use another standard tables from not same camera to load by default and get jpeg demension to made it working?

Aslo i want to know how we get the demension of raw data and fix information in SOF

Sample: https://drive.google.com/drive/folders/1Ylc61SRLVCjZ2nSla7e-8xmfg95EyMuu

I use this IMG_1333_q100_s2x1.jpg as sample:

  1. I overwrite the W: 5664, H:4400 via hexedit

  2. I cut from 153605-> EOF of 20210526_190313.jpg.wiot and merger after FFDA+12bytes of "1." And jpeg working but jpeg align still wrong, how can i automate algin like your jpegrecovery

3.How to scan MCU layout (SOF) from raw data of corrupted jpeg to patch for the sample header

  1. When use this method for another jpeg corrupt have issue like this:

image -> i use wrong 2x2 of SOF

I have some update when i use IMG_1333_q100_s2x1.jpg for NLH0591.JPG.nppp is working but IMG_1333_q100_s2x2.jpg (wiot folder) not working And the block issue get better result:

The left image is caver method and the right is file header patch method

https://drive.google.com/file/d/1u2h200hcY5KCZJAcxZng6yklhlDrlVL1/view?usp=drivesdk

image

And i want to know the poblem of SOF or DHT? Also how we get the jpeg demension from raw data hope you can help me

Also i found that digital camera used chorma sampling 4:2:2 and mobile phone use 4:2:0, and how we auto detect the fragment raw data to match these chorma

After i patch the s2x1 file the demension get wrong like this: image

HanalogInstruments commented 2 years ago

HI,

Before the pipeline note that it works for the couple of standard Huffman codding tables. Please check our papers for the details (and cite them if you publish your work.)

Here is a very abstract overview of the decoder.

  1. It decodes the given encoded file until reaching a synchronization points. There are two rules to find a sync point: 1) reaching an invalid Huffman code or invalid marker, and 2) overflowing zero pixels while making zigzag decoding an 8x8 block. You can find out detailed explanation in the paper.

  2. After decoding 8x8 blocks, the decoder creates a pseudo header by analyzing these blocks. It first tries to find Chroma subsampling, and then image width by checking the pixel correlations within the block borders. It then finds the start of an image to correctly position the first validly recovered MCU block in the first row. It finally tries to normalize color of the image. The last step may not work properly since jpegs are encoded deferentially by the reference of the first pixel. Hence, when we lost the first pixel, the following pixels' colors are depending on the first partially recovered one. You can try to manually fix it.

  3. I may forgot some of the steps, please check the papers.

Note, it seems the chroma subsampling of your image seems correct. For selecting dqt tables, I analyzed around 1 million jpeg images and try to use the most common dqt table for each chroma scheme. So, it may not give the best result for your image, but our number-one purpose was to recover a raw and headerless encoded jpeg data into a viewable content. The fine-tuning and enhancement may be applied after recovering these pixels.

Please let me know for further questions.

Thanks,

Erkam

How can i use you class to read demension only of mcu and keep original raw data nor normalize it because the color block look corrupt