JiscSD / archivematica

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
http://www.archivematica.org
GNU Affero General Public License v3.0
0 stars 0 forks source link

MCPClient: disable capture of output for compressAIP client script #42

Closed lower29 closed 6 years ago

lower29 commented 6 years ago

The task in #39 is to disable the capture of output specifically for the compressAIP function in compressAIP.py in the MCP client scripts. However, in looking at the code it became apparent that the best place to do this is actually in the executeOrRunSubprocess.py library in archivematicaCommon.

I've therefore added a capture_output parameter to the functions in executeOrRunSubprocess.py. This is set to True by default so existing behaviour is maintained. However the compressAIP script sets capture_output to False, thereby disabling the capture of stdout and stderr for this specific task.

I've also included a py.test unit test to demonstrate and prove the expected functionality.

@jhsimpson and @sevein I'd appreciate your feedback on this approach, in light of the work that is in progress or planned upstream.

sevein commented 6 years ago

@lower29 #37 is now merged, it included a fix for the linting issue that breaks your build.

jrwdunham commented 6 years ago

@lower29 FYI I've done some profiling and it seems that if you pass stdout/stderr to devnull in your Popen call, you'll get significant performance gains.

To replicate, create a Python executable file called bigoutput containing this:

#!/usr/bin/env python
from __future__ import print_function
SIZE = 10 ** 7
print('.' * SIZE)

Then in iPython I timed both the "with-devnull" and "without-devnull" (with_popen) strategies:

In [24]: def with_popen():
    ...:     p = subprocess.Popen(['./bigoutput'])
    ...:     p.communicate()

In [25]: def with_devnull():
    ...:     with open(os.devnull, 'w') as devnull:
    ...:         p = subprocess.Popen(['./bigoutput'], stdout=devnull, stderr=devnull)
    ...:         p.communicate()

In [26]: %time with_popen()
CPU times: user 643 µs, sys: 5.8 ms, total: 6.44 ms
Wall time: 338 ms

In [27]: %time with_devnull()
CPU times: user 589 µs, sys: 3.6 ms, total: 4.19 ms
Wall time: 42.8 ms

I also tried passing various values for the bufsize arg to Popen and that had little effect in terms of improving the bare Popen strategy.

I'm going to be doing something similar to this in the https://github.com/artefactual/archivematica so hopefully we can coordinate efforts.

lower29 commented 6 years ago

Thanks to @jrwdunham and others for the work upstream on https://github.com/artefactual/archivematica/pull/805. I've rebased the dev/jisc-issue-39 branch against latest qa/jisc and have cherry-picked the commits from the upstream PR. I had to make a change when merging https://github.com/artefactual/archivematica/pull/805/commits/cbafeaf3b1814b126eb3fd7ba9c080d417ffc888 because there was an existing change 335e942b37e4e9bf15b052a244cfdba8f1575d9f in this fork that caused a conflict.

@jhsimpson I think this is now ready for review in the context of the Jisc fork too - I realise that it's ahead of vanilla in that this came from 1.8.x, so we've got an impure mix of 1.6.x and other bits on this fork. No doubt this will work itself out in the end!

lower29 commented 6 years ago

@helenst @jrwdunham @jhsimpson I've responded to the comments in Helen's review and made the changes she requested.

I was hoping to be able to run the unit tests to confirm they were working correctly, but couldn't figure out the correct incantation to get them to run (I even talked to @sevein about it but still no luck). I'm fairly certain it's correct, but haven't been able to confirm.

@helenst if you get the chance can you take another look and hopefully approve? @jhsimpson if you know that Helen's not going to be able to do this in a reasonable timeframe can you or @jrwdunham review again instead? It would be good to get this merged before Christmas this year. Thanks 🙂

lower29 commented 6 years ago

Hi @helenst, thanks for the magic incantation for running the test and thanks for spotting the stupid copy-paste error. I've fixed it now so the test runs as expected.

Its a shame we can't get this tested without using the rdss-archivematica compose, but for now it does the job, and the end result is more important than how we got there.

jhsimpson commented 6 years ago

This work has been released upstream in archivematica v1.7.1 - @sevein you suggested closing this PR and merging in the 1.7.1 changes instead, I agree.

sevein commented 6 years ago

https://github.com/JiscRDSS/archivematica/pull/78 replaces this.