Processing of large tiff files

cantaloupe-project / cantaloupe

High-performance dynamic image server in Java

https://cantaloupe-project.github.io/

Other

267 stars 107 forks source link

Processing of large tiff files #93

Closed FPaquette closed 7 years ago

FPaquette commented 7 years ago

Hi,

This is a follow-up to issue #58 regarding the conversion of large tiff images to jpeg.

I have figured out what was causing the problem. When the tiff uses to much memory (the file you tested is 680 Mb but needs more than 3 Gb of memory when open), it looks like the memory of the application server is not being used (I’m assuming because of the amount of memory needed) and the browser is taking over. From my tests, Chrome can manage approximately up to 1.3 Gb and Firefox up to 3.3 Gb before closing the connexion. If I manually set Chrome to start with 5 Gb of cache, it will work.

But when you tested it, it worked correctly for you without any problem so I was wondering how your application server is set up, did you added something for the processing of large files?

It is working correctly when I ask for 20% of the original size but it’s taking about 55 seconds, which is a long time.

Thanks Francois

adolski commented 7 years ago

It's good news (to me :)) that the broken pipe errors were caused by the browser.

I think the main problem with this image is that it's not encoded in a way that enables it to be delivered efficiently. There is some discussion of this here: https://medusa-project.github.io/cantaloupe/manual/3.2/images.html#TIFF

Because it's not pyramidal nor tiled, every request against it (no matter the scale or region) will cause it to be read fully and decompressed into memory, which, as you are noticing, is very slow and inefficient.

I don't know what specific constraints are built into the various web browsers, but, those constraints are a major reason for using an image server in conjunction with a zooming viewer and an efficient source format (like pyramidal tiled TIFF or JPEG2000).

For testing, I used curl -O "url" to stream the download to disk and eliminate the browser as a variable.

FPaquette commented 7 years ago

Thanks for your input, we will definitely have to convert those strip tiffs into another format.

For your information, we are using Cantaloupe here at Bibliotheque et Archives nationales du Quebec since mid-December (numerique.banq.qc.ca). We started with a small amount of images, about 50,000, and we have more than 1.5 million overall. Also, we are not yet creating the thumbnails with Cantaloupe but it is our goal to do so in the near future. We are starting will small collections of images to have a better idea of the capacity of Cantaloupe. That said, I'll be happy to share any information about our experience and finding with you, just let me know.

adolski commented 7 years ago

Excellent! I'm so happy that Cantaloupe is working out for you. 😄 Any obstacles? Missing features? Things you wish were different?

P.S. If I were to add a "users" page to the website, would you be comfortable with having your institution listed?

FPaquette commented 7 years ago

I think Cantaloupe is pretty straighfoward, there is nothing I can think of right now but I'll keep in mind to pay more attention to things that could be improved.

I do have an idea about a feature. Do you do want to talk about it in this thread or would you prefer bu email?

About the users page, sure, you can list us there! This would probably be a good opportunity to share information with other institutions using Cantaloupe.

adolski commented 7 years ago

Great, I will add that users page in the next couple months (probably when 3.3 is released).

For the new feature, sure, I would be interested to hear it. If you don't want to post it on Github, you can email me at alexd@illinois.edu.

FPaquette commented 7 years ago

I just send you an email, I'll close this thread.