Closed LeoHsiao1 closed 4 years ago
A lot of the code is uses uint64_t for offsets into files. So, in principle, Exiv2 can handle HUGE files. However I don't believe it has been tested with HUGE files.
I'm currently writing a book Image Metadata and Exiv2 Architecture which discusses a program tvisitor.cpp which is a miniature version of Exiv2 and implements BigTiff (which isn't in Exiv2 yet). I'm confident it will handle HUGE files although I don't have any test files. https://clanmills.com/exiv2/book
If you can make your files available, I will use them to test the tvisitor.cpp code.
The project to properly test Exiv2 with HUGE files as been on the TODO list for several years and we have never had the resources to undertake the work involved. It's highly likely that you are pushing the boundaries.
Of course, I am always looking for engineers to help with Exiv2. If you'd like to get involved, I'll be happy to mentor and support you. However, I'm focused on the book at present as I want to complete that before my 70th birthday in January. I plan to give a talk at LGM in Rennes in May 2021, run an afternoon work-shop based on the book and then finally retire.
Thank you for your guidance. Such images take up a lot of disk space. So instead of saving them, I used Python to copy the content of the small image many times to generate the large image. Although I don't know the encoding format of JPEG images. I could try writing a script that generates images of any size.
I just looked Exiv2 RoadMap. I am familiar with Python and Devops. Maybe I can help with testing, CI and ops. But I live in UTC+8:00 time zone. My next reply may be in 20 hours.
The format of JPEG (and many others) is documented in the book. Generating a file of arbitrary length is a good idea. If you've solved that, lets add something to the Exiv2 test suite which is written in Python3.
UTC+8? Mongolia, Russian, WA?
Shenzhen, China
If either of you need sample files, feel free to reach out. I prompted the question of the python version of Exiv2 failing for very large JPEG files, generated by PIL.
My programming knowledge is fairly low, but I have a vested interest in seeing it work for large files. I'm hoping we also see working Exiv2 with JPEG's that come close to the limit (55k x 55k pixels or such). Compatibility with BigTIFF is an ambitious aspect, and kudos to you for taking that on.
Thanks @kolt54321 That's very helpful.
BigTiff isn't difficult. It's very similar to Tiff and I've wanted to add it for years. However the Tiff Parser in Exiv2 (src/tiffvisitor.cpp) in Exiv2 isn't easy to understand and I didn't know how to modify it for BigTiff.
My book is built around a simplified version of Exiv2 in one file. tvisitor.cpp has 2000 lines of code which I hope will be easier to understand that the 100,000 line in Exiv2. At the moment it handles Tiff, BigTiff, JPEG, PNG, CRW, JPEG, JP2 and ICC. Exiv2 has about 20 image handlers and I intend to document them all in the book. I hope to finish the book before the end of 2020. Draft: https://clanmills.com/exiv2/book
I'd like to retire. I'm really exhausted by the project and the rudeness and criticism and verbal abuse of many users (not you or @LeoHsiao1). I hope by writing down everything I know about metadata to inspire somebody to maintain Exiv2, or to develop a new library. Whether your HUGE files are used by Exiv2, tvisitor.cpp or a future development, your contribution is of great help. Thanks very much.
@clanmills There are certainly plenty of bad apples on the internet, for sure. One might even say 99% of internet users don't even come close to innovating as you have, so you have what to be proud of. Most are takers, not givers. Don't let them get to you!
Thanks @kolt54321 (Good handle, I like it). Yes, any fool can criticise!
Let's see what tomorrow brings for HUGE files. I'll sleep on this and tomorrow, you or @LeoHsiao1 or Lizzie (the cat) will say "it's obvious - just ....".
I wrote to Joris (who maintains the BigTiff documentation and web-site) and he said "Contact GeoTiff". Joris was very helpful and pointed out a couple of minor bugs in my BigTiff implementation.
So, let's be positive. Life is good! Sun's shining. The garden's good. I don't know anybody who's sick - and there are great folks such as you and Joris on the internet.
Beautiful garden and shed! Yes, let's see what tomorrow brings.
You have a nice yard. I'm starting to read your book now. There are a lot of details in the book, I will learn slowly and enjoy the process.
By the way, aren't you going to use CMS to render your ebooks? I recommend Docsify and Vuepress, both of which support markdown format, displaying catalogs and full-text search.
@LeoHsiao1 Thanks.
I've never heard of Docsify or Vuepress. I put information into the book (about page 8) about how the book is being created. I'm totally focused on the content at the moment - especially the tvisitor.cpp code.
I looked at PIL (Pillow) and libtiff this morning. Yesterday I looked at FreeImage. I haven't yet persuaded libtiff to create a BigTiff file, so I haven't broken the 4Gb barrier yet. However, a little more digging about and I'll be creating 10GByte BigTiff files which tvisitor.cpp will parse in less than 1 micro-second.
I have succeeded in building libtiff 4.1 with BigTiff support. I've written a little program to generate BigTiff images. The biggest image I can generate is 51000x21000 at 32 bits/pixel. After that, I get a mysterious message from libtiff Integer overflow in TIFFVStripSize The generated file is 4.0G. tvisitor reads it effortlessly in 0.003 seconds - which is what he takes to parse almost every file. tvisitor uses FILE* to perform the IO with lots of calls to fseek()
and ftell()
.
I'm going to add something to the book under "10 Test Suite and Build" to discuss using libraries to generate test images. I'll document how I got libtiff 4.1 to build and I'll include my little BigTiff image generator. If you know how to get PIL or FreeImage to generate HUGE files, please let me know.
@LeoHsiao1 Did you go to school/college in the USA? You speak English with an American accent! The book is very much "work in progress". I'm working on it full time at the moment. It will be finished by the end of 2020.
I have tried using PIL and Photoshop, neither of which can save more than 2G images, so I decided to generate my own. I am not good at English, so I am talking to you through translation software. LOL
@clanmills I think I can create large PNG files (10gb?) For testing if you guys need it. BigTIFF is tricky since last time I tried it needed to hold the image in memory in order to create it, and my RAM isn't large enough.
I guess the translation software has come a long way since first being created. I saw an AI translation website which was really cool too - trying to simulate local dialects and all.
The translation AI is an American - almost certainly in Silicon Valley!
BitTiff is done. Solved. I'm adding information to the book right now about building libtiff-4 which supports bigtiff. It generates the file on disk. It doesn't hold it in memory.
@kolt54321 10gb PNG? That's a surprise because I thought it was limited to 32 bit. You're right. PNG can have have 2^31 chunks of length up to 2^31 bytes. How did you do that?
@clanmills By accident, actually. The file was too large for JPEG (limited to 65k x 65 pixels, notable JPEG2000 doesn't seem to have this limitation) and it defaulted by code to an uncompressed, unoptimized PNG at 10gb.
Good stuff. With PIL?
We want the file to be HUGE. We don't care about the metadata in the file as we're really only testing that I/O operates OK on HUGE files. We already have thousands of images in the test suite which test Exiv2/IPTC/XMP/ICC and many formats, including fuzzed files (deliberately corrupted malicious files).
Yep! With a normal PIL save. I know that exif support for PNG has only been incorporated recently; would that hamper your testing on writes?
If not, I'm happy to provide a test file for you guys. Just need to DL/UL it
What DL/UL ? Dreadful looser and out of luck?
It's 19:25 in England. It's been a long and successful day getting libtiff-4 to build and generate HUGE BigTiff files. I didn't need to touch tvisitor.cpp - it parses the file effortlessly (.002 seconds to read the structure of a 6GB BigTiff file.
I've added a section to the book and acknowledged your help and Joris Van Damme's input: http://clanmills.com/exiv2/book/#10-4
Tomorrow, I'll look at saving HUGE PNGs with PIL.
I hope to look more at FreeImage. It's really impressive and beautifully documented.
I've been working on Exiv2 for 12 years. I didn't write Exiv2 - a smart Swiss guy wrote it. I've been supporting users and keeping the product alive for years. It's interesting to work on the book because I've got my head out of the code and thinking about how things could be done differently. One possible consequence of the book is to write a brand new and simpler library. Could I write a PIL/Metadata module? Yes, I believe that's possible and might even be quite easy! Will I do it? No. Hell No. I want to retire.
@kolt54321 Where are you located?
@LeoHsiao1 If you'd like a challenge, there are 18 bash scripts in our test suite which I would like rewritten in Python. There's a proposal (and prototype) for this. #1215
Download and upload! I should have spelled it out. Let me get you a large PNG for you to work with. I'm in NY, USA currently.
@kolt54321 Don't bother at the moment. I'll look at PIL/PNG tomorrow. By the time you get to work on Friday, I'll probably have dealt with this. This morning I looked at PIL/JPEG and he refused to generate a HUGE file.
NY? Alison and I are US Citizens. We lived in California/Silicon Valley for 15 years and went "home" when we retired in 2014. We've been several times to NY State and NYC. Most recently in 2017 to see FDR's Library and spend a couple of days with an old buddy in Jersey City. We love our "other" country and have been in all 50 states, most of the National Parks and almost all the Presidential Libraries. We're happy with our decision to retire near the family in England. Life is good.
Great! Silicon Valley is expensive, and I agree (not that it matters) with going back to family. Houses are fairly expensive across both the west coast and east, more so than anytime in US history. Lots of good stuff to do here, but cost of living is fairly high. At the end of the day doing what matters most to you can't be the wrong choice.
OK, Gentlemen. I'm going to close this. I generated a 7Gbyte PNG yesterday, and both Exiv2 and tvisitor.cpp read it effortlessly. It only took microseconds to say "no metadata here". As I proceed to finish the book, I expect I'll look at HUGE files in other formats (JPEG and JP2000). Several years I had correspondence with people at Imperial College in London concerning 100GByte BigTiff files from a medical application. I believe that medical scanners generate multi-page BigTiff files.
I tried the pypng module, which easily generates a PNG image of the specified content. The generation of JPEG images need to consider DCT transform, Huffman coding, more troublesome. I now find that the actual images I encounter can be much more complex than the manually generated images.
Hello! I did not find the image size limit for Exiv2 in the documentation. According to the definition of class
Exiv2::ImageFactory
:Exiv2 uses a
long
variable to store the size of the image. So it can only open the image less than 2G?I tested it with the following images:
Test reading images:
Test modifying images:
It looks like Exiv2 can only read the image less than 2G and modify the image less than 1G. Is that right?