CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
186 stars 55 forks source link

ensure that the file pointer is at the beginning #168

Closed IuryAlves closed 9 years ago

IuryAlves commented 9 years ago

When you have an image which is referenced more than one time. In the second time image_part.stream.read() in the method parse_image returns a empty string. I fixed it back to the beginning of the file, before reading the file. more specifically placing the following code: image_part.stream.seek(0)

Here is a docx that reproduces this bug: https://drive.google.com/file/d/0B7-HjtaVhQPSamxndVo0Q00tT2M/view?usp=sharing

winhamwr commented 9 years ago

Hello Lury,

Thanks so much for the pull request!

To speed up the process of getting this merged in, is there any way you could add a unit test for this?

You can add the document that you linked to fixtures with a correlating .html file of the output you expect. Then, you can add the name of the file/html to cases to have the .docx file loaded and its output compared to our corresponding HTML file.

Thanks! -Wes

IuryAlves commented 9 years ago

@winhamwr Done!

It takes me some time to realize that the test html is just the contents, not the whole html = D

winhamwr commented 9 years ago

Hi Lury,

Sorry for the slow response. Your testcase looks good and hopefully @kylegibson will be along to give it a close review and merge, soon.

Thanks again! -Wes

IuryAlves commented 9 years ago

@winhamwr Thanks =D

jlward commented 9 years ago

Looks good. Thanks for the PR