CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
183 stars 55 forks source link

`TypeError: expected string or buffer` when .doc is converted to .docx with MS Office in Windows #219

Open rejuashes opened 8 years ago

rejuashes commented 8 years ago

pydocx_html_windows_error.txt Hi Guys,

I am facing a situation where pydocx.to_html behaves indifferently on a same .doc file which is converted to a .docx file.

Scenario 1 : .doc file is converted to .docx file using libreoffice in linux(saving as Microsoft word 2007/2010/2013 XML) - works fine.

Scenario 2 : .doc file is converted to .docx file using MS Office in windows - throws an error.

return re.match('^\s([^\s]+)\s(.*)$', self.instr) File "/usr/lib/python2.7/re.py", line 137, in match return _compile(pattern, flags).match(string) TypeError: expected string or buffer

Any pointers would be helpful.

regards,

Rajith

kylegibson commented 8 years ago

Hi,

Thanks for the issue report! Could you attach the .doc converted to .docx using MS Office in windows that is throwing the error?

Thanks,

-Kyle

rejuashes commented 8 years ago

Hi Kyle,

Attaching the original source .doc file which was converted to .docx.

regards,

rajith

ABC.zip