c-okelly / org_to_anki

Python3 module to convert Txt, Org or LibreOffice files into Anki decks
MIT License
64 stars 10 forks source link

Multi-level lists do not seem to import in Word #25

Closed Hawley-Griffin closed 4 years ago

Hawley-Griffin commented 5 years ago

Are multi-level lists from .docx files supported? If so, are there any formatting or other limitations? Or some other requirements?

Currently running into an error when I try to import into Anki.

Raw text of the file you tried to upload

Word and HTML files: Anki Addon (Lists to Anki) Error Report.zip

Error report from the popup

The error was 'utf-8' codec can't decode byte 0xb7 in position 53017: invalid start byte.

Error report: Traceback (most recent call last): File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148__init__.py", line 52, in importNewFile parseAndUploadOrgFile(filePath, embedded=True) File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148\org_to_anki\main.py", line 29, in parseAndUploadOrgFile _parseAndUpload(filePath, embedded) File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148\org_to_anki\main.py", line 45, in _parseAndUpload deck = parseData.parse(filePath) File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148\org_to_anki\org_parser\parseData.py", line 17, in parse formatedData = convertBulletPointsDocument(filePath) File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148\org_to_anki\converters\BulletPointHtmlConverter.py", line 27, in convertBulletPointsDocument documentType = checkDocumentType(filePath) File "C:\Users\Hawley Griffin\AppData\Roaming\Anki2\addons21\1029306148\org_to_anki\converters\BulletPointHtmlConverter.py", line 37, in checkDocumentType soup = BeautifulSoup(htmlFile, 'html.parser') File "lib\site-packages\bs4__init.py", line 245, in init__ File "C:\Program Files\Python36\lib\codecs.py", line 700, in read File "C:\Program Files\Python36\lib\codecs.py", line 503, in read UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 53017: invalid start byte

What is your operating system

Windows 10

What was the original file type

Word File (.docx) converted to .html

c-okelly commented 5 years ago

Hey,

So I have had a look into this and there is more then one issue going on.

I will do best to try and fix the issues with word as do want to support it overall long term.

If you would like a work around I would suggest using LibreOffice instead. It is a free alternative to word and output much more consistent html documents so is probably more stable

https://www.libreoffice.org/