c-okelly / org_to_anki

Python3 module to convert Txt, Org or LibreOffice files into Anki decks
MIT License
64 stars 10 forks source link

long lines in the top level of bullets are truncated before they appear on "Front" in Anki #78

Closed EntropyOrSloth closed 4 years ago

EntropyOrSloth commented 4 years ago

Please follow the template below to report your issue.

Please include the following information:

Any general information

I imported an HTML bullet list from MS Word and imported using this add-on. There were items like this: "animate noun 2nd declension masculine nominative singular". On importing into Anki using this add-on, the "Fronts" were mostly truncated. For example, to "animate noun 2nd declension masculine", instead of the original "animate noun 2nd declension masculine nominative singular" line.

The only way I found to fix this is to edit each card in Anki, which defeats the purpose of this add-on.

These are the input files: russian declensions - nominative, accusative, genitive.zip

These include both the original MS Word file as well as the HTML file created from it which was imported into Anki via this add-on.

Raw text of the file you tried to upload

The PasteBin text from here can probably be used to recreate my file, however better would be to use the original input files given above: PasteBin URL link.

Error report from the popup

There were no errors reported by Anki.

What is your operating system

MacOS X Catalina 10.15.5

What was the original file type

MS Word .docx file which was then saved as a .html, the latter then being imported into Anki.

c-okelly commented 4 years ago

Hey,

I will have a look into this and get back to you. I'm not sure why that is happening.

I have been meaning to remove all mentions for support for Microsoft word in general though. Especially on Windows machines it is very hard to predict exactly what type of output Word is going to generate.

I would appreciate if you keep you issue reporting to Github rather then on the reviews for the add-on but feel free to write what you think on the reviews one way or another.

EntropyOrSloth commented 4 years ago

Yeah, sorry, I didn't realize this was on Github until the very end. I'll delete my other messages if I can.

On Wed, Jul 15, 2020 at 12:30 PM c-okelly notifications@github.com wrote:

Hey,

I will have a look into this and get back to you. I'm not sure why that is happening.

I have been meaning to remove all mentions for support for Microsoft word in general though. Especially on Windows machines it is very hard to predict exactly what type of output Word is going to generate.

I would appreciate if you keep you issue reporting to Github rather then on the reviews for the add-on but feel free to write what you think on the reviews one way or another.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/c-okelly/org_to_anki/issues/78#issuecomment-658866114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJMH4DW27CCBN6EI7V6YE3R3XKQZANCNFSM4O2TVCOQ .

EntropyOrSloth commented 4 years ago

I have deleted my other comments elsewhere, at least for the time being.

I just downloaded LibreOffice 6.4.5, installed it on my MacBook Pro and will try importing my docx file and creating a new HTML file using LibreOffice to import into Anki to see if this problem will reproduce using LibreOffice too.

On Wed, Jul 15, 2020 at 1:02 PM Eli Liang eliliang@gmail.com wrote:

Yeah, sorry, I didn't realize this was on Github until the very end. I'll delete my other messages if I can.

On Wed, Jul 15, 2020 at 12:30 PM c-okelly notifications@github.com wrote:

Hey,

I will have a look into this and get back to you. I'm not sure why that is happening.

I have been meaning to remove all mentions for support for Microsoft word in general though. Especially on Windows machines it is very hard to predict exactly what type of output Word is going to generate.

I would appreciate if you keep you issue reporting to Github rather then on the reviews for the add-on but feel free to write what you think on the reviews one way or another.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/c-okelly/org_to_anki/issues/78#issuecomment-658866114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJMH4DW27CCBN6EI7V6YE3R3XKQZANCNFSM4O2TVCOQ .

c-okelly commented 4 years ago

Thanks for all the info. Currently looking into the issue though might be a day or two to get to the bottom of it

c-okelly commented 4 years ago

Hey,

Thanks for sending on all the information. Especially the actually html files.

The short and long of the issue is there Word just does not produce predictable HTML when you save a Word document. It appears the superscript symbols actually break the each line and the parser stop looking for information after this.

Unfortunately after having done a bit of work with the Word HTML files I have found they are actually too difficult to support consistently across systems.

To close this ticket I'm going to do the following:

TLDR Word won't work but there is a simple migration path to fix your error:

  1. Install LibreOffice (Free / OpenSource Word equvilant)
  2. Open the file in LibreOffice
  3. Save as a HTML file
  4. Upload file to Anki
c-okelly commented 4 years ago

Let me know if that doesn't work!