eQualityTime / TheOpenVoiceFactory

The Open Voice Factory - open software for people with communication difficulties
http://theopenvoicefactory.org
23 stars 9 forks source link

OVF incompatible with most recent release of python pptx. #101

Closed joereddington-public closed 6 years ago

joereddington-public commented 6 years ago

Running

python test_grab_text.py

with a current (Version: 0.6.13) version of python-pptx, fails four of the current tests.

Ran 16 tests in 72.175s

FAILED (errors=4)

Installing version

 pip install python-pptx==0.6.7 --user

Results in all tests passing.

The fail for python test_grab_text.py ovfTest.test_regession_ck12 starts at 0.6.8, and the output is:


======================================================================
ERROR: test_regession_ck12 (__main__.ovfTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_grab_text.py", line 150, in test_regession_ck12
    regress("regression_tests_size_4/CK12+V2.pptx",4)
  File "test_grab_text.py", line 168, in regress
    compare_json_files(filename,filename+".json", size,bordercolor)
  File "test_grab_text.py", line 173, in compare_json_files
    grids = grab_text.Pageset(pres_loc,"",False).grids
  File "../grab_text.py", line 78, in __init__
    self.extract_grid()
  File "../grab_text.py", line 92, in extract_grid
    self.grids.append(Grid(self.prs, slide, gridSize, self))
  File "../grab_text.py", line 152, in __init__
    self.process_shape(shape)
  File "../grab_text.py", line 178, in process_shape
    click_action = shape.click_action
  File "/Users/joepublic/Library/Python/2.7/lib/python/site-packages/pptx/shapes/group.py", line 23, in click_action
    raise TypeError('a group shape cannot have a click action')
TypeError: a group shape cannot have a click action

----------------------------------------------------------------------
Ran 1 test in 4.509s

We have confirmed that the other fails also happen at this version of python-pptx

This isn't helpful for new developers and we should bring our code up to date.

Plan

joereddington-public commented 6 years ago

python test_grab_text.py ovfTest.test_regession_ck12 is probably the best way to find a fast fail.

That test passes for 0.6.1, 0.6.2 and 0.6.7 - it fails at 0.6.11, 0.6.9 and 0.6.8

Quite pleased with myself, I dug into the python-pptx code and found out a way to deal with with two lines of code. All tests passing again. Will ping Steve at python-pptx to let him know that there was a (very) small backward's compatibility issue.

scanny commented 6 years ago

Hi Joe, I've had a serious think about this and I'm inclined to conclude that the present behavior is the best compromise between competing design interests.

What I do think would improve matters is to add a _BaseShapes.iter_leaf_shapes() method, which would be inherted by SlideShapes, the object present on Slide.shapes. There's an implementation just below you can use without waiting for me to get around to adding it in.

Let me know what you think. You can comment on issue I raised for this in python-pptx repo here:
https://github.com/scanny/python-pptx/issues/435

Here's my rationale in more detail:

The unfortunate aspect of the situation is that code that deals with shape click actions (hyperlinked shapes basically) may need to know that group shapes can't have a click action and have code to guard against attempting access to shape.click_action on group shapes.

Here's why I think trying to remove that inconvenience would degrade the design more than improve it:

  1. If you happen to know there are no group shapes, perhaps because you just wrote the slide, then you can remain blissfully ignorant of this situation. I expect this accounts for 99.9% of use cases when folks who don't do anything with shape hyperlinks are factored in.

  2. If you are dealing with "from the wild" presentations, you need to be dealing with group shapes explicitly anyway, lest you only consider top-level shapes and miss those otherwise "hidden" inside groups.

    In this case, to iterate over all hyperlinkable shapes on a slide, you need something like this:

    from pptx.enum.shapes import MSO_SHAPE_TYPE
    
    def iter_leaf_shapes(shapes):
       for shape in shapes:
           if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
               for s in iter_leaf_shapes(shape.shapes):
                   yield s
           else:
               yield shape
    
    for shape in iter_leaf_shapes(slide.shapes):
       process_shape_I_know_for_sure_isnt_a_group_shape()

    Using a shape-tree iterator like this, you naturally don't get group shapes presenting themselves to be processed.

  3. The alternative would make the get case more streamlined, but would just kick the can down the road for the set case. I've outlined that option at the end here just for the record.

Option I don't like but might be worth preserving for the record

Ease iteration of shape sequence click actions by having shape.click_action always return an ActionSettings-like object, so click_action = shape.click_action would never raise or return None, regardless of the shape.

This would be accomplished by a new NullActionSettings object (maybe with a better name), that has the same interface as today's ActionSettings object, but with the following behaviors:

This would support a protocol something like this:

for shape in shapes:
    if shape.click_action.action == PP_ACTION_NONE:
        continue
    # ... process click action ...

While this NullActionSettings approach might ease the get side of the equation, you would still face the same "need to treat group shapes as special case" challenge when attempting to set a click-action on an unknown shape.

Regarding the option of returning None for GroupShape.click_action, I would consider that only slightly less onerous than the current behavior for get.

Instead of this:

for shape in shapes:
    try:
        shape.click_action
    except TypeError:
        continue
    if shape.click_action.action == PP_ACTION_NONE:
        continue
    # ... process click action ...

You still have two steps, one to skip over group shapes (the only type that would return None) and one to skip over shapes that can but don't have a click action defined:

for shape in shapes:
    if shape.click_action is None:
        continue
    if shape.click_action.action == PP_ACTION_NONE:
        continue
    # ... process click action ...

And it would do nothing for the set side, which would result in None object has no attribute 'action' on attempts to set an action on a group shape.

joereddington-public commented 6 years ago

Thanks for the really well reasoned answer. It's certainly true that we're in an edge case over here. Worth knowing about tho. Yay for open source!