aphtech / send-to-braille

Braille quick translation
Other
3 stars 2 forks source link

Avoid UTF-8 reencoding when appropriate. #7

Closed tmthywynn8 closed 6 years ago

tmthywynn8 commented 6 years ago

Why this pull request is needed:

When creating pull request #4, only ensuring compatibility with text-based markup languages was considered. All files were reencoded asUTF-8 if needed, which unfortunately negatively impacted any non-text formats such as Word Documents (.docx). As a result,Pandoc was unable to read such files due to the change in encoding -- akin to copying ASCII instead of binary.

What this pull request does:

  1. Using Pandoc User’s Guide, made a list of all formats that are neither plain text nor text markup languages. This task was made quite simple due to said formats being at the end of the list of supported input formats -- "... EPUB, ODT, and Word docx."
  2. Wrote up a conditional statement using De Morgan's laws to exclude the three extensions for the filetypes listed above, creating a variable to store whether or not the reencoding process should be ran.

Testing performed:

  1. Converted a text file into both contracted and uncontracted UEB (using lt.bat and lt1.bat respectively), making sure that the input file was ran through the Utf8n executable.
  2. Converted a Word Document (.docx) and an Epub file into both contracted and uncontracted UEB, making sure that the input file went directly to Pandoc.