EOL / deprecated_eol_php_code

Encyclopedia of Life
http://eol.org/
Other
5 stars 7 forks source link

Store crop location in DB and use hard links if possible (2nd attempt) #113

Closed hyanwong closed 9 years ago

hyanwong commented 9 years ago

As discussed with Jeremy Rice, this code implements several improvements to ContentManager.php, in particular in the grab_file() function:

1) When grab_file is called with a local file rather than a url, the code now tries to create a hard link to that file, instead of downloading a new copy (if it can't it falls back to downloading as before). This means when we do a custom crop of an image, we can keep the original download (which may be in a png, tiff, etc form, and so potentially a large file) without increasing disk usage. Previously, cropping an image resulted in losing the original downloaded file and retaining only an EOL-created jpg version.

2) When grab_file() is called on an image with a data_object_id given in the $options parameter, we store the height and width of the image in the image_sizes table in the EOL database (done by functions called within create_local_files_and_thumbnails()). I have checked that all calls to grab_file(xxx, "image") in the EOL codebase have been changed to pass in the data_object_id, apart from cases where we are grabbing thumbnails, or where we are testing the server (e.g. in test_create_load.php). Previously we did not store this image data.

3) When a custom crop is called (via crop_image()), the values are converted to percentages, then passed in to grab_file, which stores them in the image_sizes table. When grab_file is called on an image that already has a data_object_id, we take any previous crop values from the database and re-apply them, so that we get the same crops even when we reharvest. Previously, reharvesting caused all user-initiated crops to be lost.

4) Unit tests have been implemented for all these scenarios, in test_content_manager.php.

All this has involved a substantial rewrite of the code in ContentManager.php. I have tried to keep the logic exactly the same, and I have conducted limited testing (including running the unit tests). However, I do not have a full EOL setup on which to test this new code, so further testing should probably be done on the EoL development system. One particular case is where resources or datasets are in a tar/zip/gzip files and are unpacked by the grab_file routine. There is no unit test for this and I have commented that one should be added (there are also no unit tests of grab_file() for types "partner" and "dataset"). It might be a useful job for the new developers to check on the logic flow of my new code and add test cases.

There are a few places where I have kept the existing code but I think it should be changed or is redundant. I have marked these with a comment which includes 4 asterisks ****. The most major of these is a suggestion that translating the crop coordinates to percentages should be done by the Ruby front-end, since these calculations are dependent on features of the web page (in particular, the restriction of 540 pixel width for displaying the croppable image). It would both be easier and more logical to do web-page specific calculations in the RoR code than to pollute the php code with web-specific parameters as happens at the moment. As a stop-gap measure I have inserted a function crop_image_pct() which can be called by if the Ruby code is altered to produce x, y, and width parameters as percentages from the 580_360.jpg file.

There are also cases where I have implemented a "hack" to work around the fact that EoL does not store on a database the downloaded file extension or (in the case of images) details of any rotations that have been done to the original. There is a good argument that these should be kept (extension in the data_objects table, rotation details in the image_sizes table.

This is the second attempt at committing, with changes incorporated after discussion at https://github.com/EOL/eol_php_code/pull/111

JRice commented 9 years ago

I'm still getting failures in the tests:

PHP Fatal error:  Call to undefined method php_active_record\ContentManager::cache_path() in /home/vagrant/git/eol_php_code/tests/unit/test_content_manager.php on line 213

...Is this a quick fix?

hyanwong commented 9 years ago

On 5 Dec 2014, at 14:29, Jeremy Rice notifications@github.com wrote:

I'm still getting failures in the tests:

PHP Fatal error: Call to undefined method php_active_record\ContentManager::cache_path() in /home/vagrant/git/eol_php_code/tests/unit/test_content_manager.php on line 213

cache_path() shouldn’t be in there at all any more. Odd. Let me check that I’ve synced properly (and I’ll see to the other thing you emailed about too)=

hyanwong commented 9 years ago

dimensions error now corrected.

hyanwong commented 9 years ago

On 5 Dec 2014, at 14:30, Yan Wong yan@yanwong.me wrote:

On 5 Dec 2014, at 14:29, Jeremy Rice notifications@github.com wrote:

I'm still getting failures in the tests:

PHP Fatal error: Call to undefined method php_active_record\ContentManager::cache_path() in /home/vagrant/git/eol_php_code/tests/unit/test_content_manager.php on line 213

Ah - I see, you’ve merged a previous pull request (https://github.com/EOL/eol_php_code/pull/112) I made which did reference the old cache_path() version. Let me do a merge.

Yan=

JRice commented 9 years ago

Perhaps my git-fu is weak, but I thought I was merging a branch, not a pull... namely your store_crop_location branch.

Is that not what you intended?

On Fri, Dec 5, 2014 at 9:51 AM, hyanwong notifications@github.com wrote:

On 5 Dec 2014, at 14:30, Yan Wong yan@yanwong.me wrote:

On 5 Dec 2014, at 14:29, Jeremy Rice notifications@github.com wrote:

I'm still getting failures in the tests:

PHP Fatal error: Call to undefined method php_active_record\ContentManager::cache_path() in /home/vagrant/git/eol_php_code/tests/unit/test_content_manager.php on line 213

Ah - I see, you’ve merged a previous pull request ( https://github.com/EOL/eol_php_code/pull/112) I made which did reference the old cache_path() version. Let me do a merge.

Yan=

— Reply to this email directly or view it on GitHub https://github.com/EOL/eol_php_code/pull/113#issuecomment-65799418.

hyanwong commented 9 years ago

On 5 Dec 2014, at 14:54, Jeremy Rice notifications@github.com wrote:

Perhaps my git-fu is weak, but I thought I was merging a branch, not a pull... namely your store_crop_location branch.

My git-fu is non-existent, I’m afraid, which might be the root of the issue.

What happened is that I has a small previous pull request in, that just added a few tidy-up lines. After I submitted my much larger pull request (but before you accepted it) you merged this previous pull, which contained some references to the old function name, so that my large pull request was out of sync with master. I’ve now synced it and changed the function name. Hopefully it should all work now.

The reason I didn’t catch the other change was there was no unit test for grab_file(…, “partner”). Perhaps one should be added?=

JRice commented 9 years ago

So, yeah, I merged this after the tests passed, which is ... bold... but I'll test this on our staging server next week before pushing it to production. ...Worst-case, we revert. ...But I'd prefer to just iron things out until it's working.

Thanks!

hyanwong commented 9 years ago

http://en.wikipedia.org/wiki/Wikipedia:Be_bold

If you give me a list of the data_object IDs that have custom crops, I can run my python image processing code to get the crop locations so that you can populate the new DB table with custom crops. I think you said the command was

datos = DataObject.where(id: CuratorActivityLog.where(activity_id: 88).select(:target_id).map(&:target_id).uniq, published: 1)

Yan=

JRice commented 9 years ago

Sure; let's wait until it hits production for that, though. :)

hyanwong commented 9 years ago

On 5 Dec 2014, at 15:13, Jeremy Rice notifications@github.com wrote:

Sure; let's wait until it hits production for that, though. :)

Fine. BTW, if you have a test .zip, .tar, or .gz “resource” file available on the system (something equivalent to WEB_ROOT.'tests/fixtures/files/test_resource.xml’), then I can add a unit test. At the moment, I haven’t tested that bit of the code (but then again, I hope I haven’t changed it either)=