djplaner / word-to-canvas-module

A userscript that will create a Canvas Module (including all module items) from a Word document (using special styles)
https://djplaner.github.io/word-to-canvas-module/
MIT License
0 stars 0 forks source link

issues with multiple files in a sequence #36

Open djplaner opened 2 years ago

djplaner commented 2 years ago

Next step

When converting to HTMl do check for canvasFileLink problems (below) , report the problem, and provide a link to docs explaining how to fix

Perhaps look at removing the empty lone link

Original problem

5251LAW_3218_GC having issues with multiple files in a row. The last file works, but the first 3 do not.

Topic 11A

THe issue appears to be more related to single links ending up with mulitple links in what is meant to be one link

<span class="instructure_file_holder link_holder instructure_file_link_holder"><a id="426855" class="inline_disabled preview_in_overlay" href="https://lms.griffith.edu.au/courses/155/files/426855?wrap=1" target="_blank" data-canvas-previewable="true" data-api-endpoint="https://lms.griffith.edu.au/api/v1/courses/155/files/426855" data-api-returntype="File">
                   HSY 1. 
</a><a class="file_download_btn" role="button" download="" style="margin-inline-start: 5px; text-decoration: none;" href="https://lms.griffith.edu.au/courses/155/files/426855/download?download_frd=1">
            <img style="width:16px; height:16px" src="/images/svg-icons/svg_icon_download.svg" alt="" role="presentation">
            <span class="screenreader-only">
              Download 
                   HSY 1.                
    </span>
 </a></span>
djplaner commented 2 years ago

Finding

But the issue is earlier thre are multiple span class="canvasFileLinks"

<p>Lecture audio file: theme 1</p><ul><li> 
<span class="canvasFileLink">HSY 1. </span>
<span class="canvasFileLink">Polities.(</span><span class="canvasFileLink">1).m4a</span></li></ul><p>Lecture 1 powerpoint</p><ul><li> <span class="canvasFileLink">HSY_1_Polities.pdf</span></li></ul>

Problem is happening earlier in Word/mammoth conversion - checkHTmlView gives

<span class="canvasFileLink">HSY 1. </span>
<span class="canvasFileLink">Polities.(</span>
<span class="canvasFileLink">1).m4a</span>

Question Is Mammoth donig this because both P and R styles are configured for canvasFileLink

djplaner commented 2 years ago

Renee Denham 27/09/2022 4:50 pm

Hi David, think I've found a bug 📷 📷 📷 it's 1 link where the text is 3 words, but is instead becoming a link per word.

Yep, that's a bug alright. A known one. The second one I mentioned in a message in the AEL LMS Migration chat. I've not had the time to fix it or even properly diagnose it (see previous complains about limited time). What I know about it follows. In fact, I've dug out the github issue on this - a description of the problem, no fix. (But I will add this explanation to help prompt me)

I remain uncertain about the actual source. It's some combination of

How the CAR process generates the Word document (using a specific Python HTML2Docx module) How word2canvas converts the Word back into HTML (using Mammoth) Somewhere in there a single style applied to a sequence is being separated into multiple. Perhaps when the Canvas File Link style is a linked style. Perhaps for these spaces are created as separate items by Mammoth.

I can see three possible solutions

Fix any problem in the CAR process - not the soln I think Configure Mammoth not to have this problem

Fix it after the fact

For an example of #3 see the postConvert function in c2m_wordConverter.js. There's a "remove any links with empty innerText".

A quick fix might be to detect such sequences of links and fix them. The challenge may be when there is meant to be a sequence of such links. Suggesting the code would have to try to distinguish what is a meaningful file name and what isn't. Not straight forward. Suggesting a need to try a #2 solution - or some combination.

Further exploration

FWIW, I've done a bit more testing (arose out of migration work I'm focusing on). I'll be adding this to the github issue and trying to get to it at some stage. Renee​, if you can, could you share a copy of the Word doc you had the problem with? I'm assuming it was taken from the CAR? Your document and the following should help with figuring out a kludge solution, if not an actual solution.

The attached Word doc (created by hand) generates this page - see image below. The two problem links were generated by adding in some text after the initial application of the style (the a and hello).

I then modified the document by copying and pasting the "hello" link and reapplying the Canvas File Link style. REsulting in this page. i.e. that re-application fixed the issue.

djplaner commented 2 years ago

Diagnose issue with edits

Step 1

What is the HTML in postConvert.

There are six visible Canvas File Link elements in the Word document. Not all of them are links. Some are just the name

McNamara.pdf Visual Analysis of a Photograph Donna K. Reid: Thinking and Writing About Art History Donna K Reid.pdf Donna K Reid.pdf Donna hello K. Reid: Thinking and Writing About Art History

but the HTML says there are 12 .canvasFileLink (showing innerHTML / href )

  1. McNamara.pdf / undefined
  2. Visual Analysis of a Photograph / undefined
  3. Donna K. Reid: Thinking and Writing... / undefined
  4. " " / undefined

There may be hope with the tab, but for now my focus was on the canvasFileLink issue. This image has the HTML generated from Renee's Word doc. It mirrors what I see in mine, revealing two possible cases

The empty canvasFileLink. Random space left over (see the last canvasFileLink below just before the embed). This may be due to the way run styles work in Word when you're manually editing. leaving a space at the end.

The multiple sequential spans breaking up a single file name This is what appears to happen when you manually do some edits.