WeTransfer / format_parser

file metadata parsing, done cheap
https://rubygems.org/gems/format_parser
Other
62 stars 18 forks source link

TypeError in JpegParser when calling EXIFR::TIFF #157

Closed fabioperrella closed 4 years ago

fabioperrella commented 4 years ago

I found some files which raise an error like the following:

$ exe/format_parser_inspect file.jpg
Traceback (most recent call last):
    30: from exe/format_parser_inspect:22:in `<main>'
    29: from exe/format_parser_inspect:22:in `map'
    28: from exe/format_parser_inspect:24:in `block in <main>'
    27: from exe/format_parser_inspect:24:in `public_send'
    26: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:104:in `parse_file_at'
    25: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:104:in `open'
    24: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:105:in `block in parse_file_at'
    23: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `parse'
    22: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `to_a'
    21: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `each'
    20: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `each'
    19: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `each'
    18: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:173:in `each'
    17: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:169:in `block in parse'
    16: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:201:in `execute_parser_and_capture_expected_exceptions'
    15: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/measurometer-1.1.1/lib/measurometer.rb:48:in `instrument'
    14: from /Users/fabioperrella/projects/format_parser/lib/format_parser.rb:202:in `block in execute_parser_and_capture_expected_exceptions'
    13: from /Users/fabioperrella/projects/format_parser/lib/parsers/jpeg_parser.rb:21:in `call'
    12: from /Users/fabioperrella/projects/format_parser/lib/parsers/jpeg_parser.rb:29:in `call'
    11: from /Users/fabioperrella/projects/format_parser/lib/parsers/jpeg_parser.rb:65:in `scan'
    10: from /Users/fabioperrella/projects/format_parser/lib/parsers/jpeg_parser.rb:159:in `scan_app1_frame'
     9: from /Users/fabioperrella/projects/format_parser/lib/parsers/exif_parser.rb:171:in `exif_from_tiff_io'
     8: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/measurometer-1.1.1/lib/measurometer.rb:48:in `instrument'
     7: from /Users/fabioperrella/projects/format_parser/lib/parsers/exif_parser.rb:172:in `block in exif_from_tiff_io'
     6: from /Users/fabioperrella/projects/format_parser/lib/parsers/exif_parser.rb:172:in `new'
     5: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:377:in `initialize'
     4: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:647:in `open'
     3: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:384:in `block in initialize'
     2: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:384:in `map'
     1: from /Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:387:in `block (2 levels) in initialize'
/Users/fabioperrella/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/exifr-1.3.6/lib/exifr/tiff.rb:387:in `+': no implicit conversion of Integer into String (TypeError)

The same file is considered valid for the linux command file:

$ file file.jpg
file.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=9, manufacturer=Olympus, model=DP21, orientation=upper-left, xresolution=8, yresolution=16, resolutionunit=2, datetime=2020:03:11 15:11:47], baseline, precision 8, 1600x1200, components 3

If I try to parse direct using the gem exifr, it also raises an error:

pry> EXIFR::TIFF.new('file.jpg')
EXIFR::MalformedTIFF: no byte order information found

So, I think there is some problem inside the gem exifr to parse it.

The file that I tested is not mine and I need permission to use it as a fixture, so I'm waiting for it because I can't create a similar file to reproduce it.

fabioperrella commented 4 years ago

I did the same approach that I did in https://github.com/WeTransfer/format_parser/issues/155#issuecomment-675677078 and I could reproduce the same error as the original file!

Now the converted file has only 3.4KB and can't be peviewed because I removed the content of it!

But even this way it returns all the metadata when I run the file command:

$ file file.jpg
file.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=9, manufacturer=Olympus, model=DP21, orientation=upper-left, xresolution=8, yresolution=16, resolutionunit=2, datetime=2020:03:11 15:11:47]

And also returns the same error when running format_parser!

I attached the file so you can see:

file

@julik @martijnvermaat wdyt using this file as a fixture?

fabioperrella commented 4 years ago

@linkyndy now that you are back, could you give your feedback about it pls ☝️

julik commented 4 years ago

If the file no longer has image data it should be OK, but if it is 3.4KB then it might still include a thumbnail in the EXIF tags. It is also possible that this is why parsing fails (parsing this thumbnail) I did notice this happening with a JPEG from an electronic microscope for example

fabioperrella commented 4 years ago

ok I will try to remove these tags!

Actually, I'm almost sure this is a picture from a microscope

fabioperrella commented 4 years ago

In the end, I think something will need to be fixed in exifr gem, so I opened an issue there https://github.com/remvee/exifr/issues/65

I still haven't been able to create a fake file or remove all the content to use as a fixture yet. I'm trying..

I talked with the support team trying to get permission from the owner of the file to use it as a fixture.