dt-woods / word

Concatenate and parse Microsoft Word (.docx) files with style! A Pythonic method for splitting, merging, and styling MS Word docs.
3 stars 1 forks source link

Create a docx. parser #2

Closed dt-woods closed 3 years ago

dt-woods commented 3 years ago

The goal is to read a .docx file and write individual .docx files based on a user-defined breaking style (e.g., parse a book into chapter files). The challenge will be to preserve the styles (both at the character and paragraph level).

The first assumption will be that the original document does not include custom styles. This is because it makes life easier and we can always use our custom style mapper on the individual files (if need be).