alphapapa / org-web-tools

View, capture, and archive Web pages in Org-mode
GNU General Public License v3.0
647 stars 33 forks source link

handle images in links #48

Open mooseyboots opened 2 years ago

mooseyboots commented 2 years ago

@c1-g i tried out your branch, it works well for relative links to files and images.

i also noticed another issue, which perhaps you could address? (it is in the main branch too, but maybe you have such knowhow?)

html like, this (an image that is also a link):

<a href="/vorratsdatenspeicherung" hreflang="de"><img src="/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw" width="410" height="208" alt="John-Paul Bader, CC BY SA 2.0" loading="lazy" typeof="foaf:Image" class="image-style-medium-crop" />

renders into a kind of hyperactivated org link

[[https://digitalcourage.de/digitale-selbstverteidigung][[[https://digitalcourage.de/sites/default/files/styles/medium_crop/public/2017-09/IMG_20160107_155735159.jpg?h=63c968e9&itok=rkduUPh5]]]]

i.e. it generates two links, one from href= and one from img src=, with mangled square brackets.

c1-g commented 2 years ago

Hi, I've encountered this kind of link as well as an invalid link like this,

[[https://en.wikipedia.org/wiki/File:Oceans_and_continents_coarse.png][]]

I think the issue with the link you gave is caused by the the img tag embedded by the a tag I managed to fix both problems by splitting the two tags in the html. So for example, your link will become

<a href="/vorratsdatenspeicherung" hreflang="de"></a><img src="/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw" width="410" height="208" alt="John-Paul Bader, CC BY SA 2.0" loading="lazy" typeof="foaf:Image" class="image-style-medium-crop">

Notice that the </a> that used to enclose the two together is now moved between the beginning of the img tag. So now that the two is seperated. And Pandoc will convert them to

[[/vorratsdatenspeicherung][]][[/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw]]

As you can see there are two links, [[/vorratsdatenspeicherung][]] and [[/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw]]

The first link is invalid because it has an empty description and org will intepret it as a normal text while the second, on the other hand, is perfectly normal.

I found the way to fix the empty description by, as you may have guessed, insert a new description into the html.

So for the invalid link,

<a href="/vorratsdatenspeicherung" hreflang="de"></a>

Insert the href as the new descrption,

<a href="/vorratsdatenspeicherung" hreflang="de">/vorratsdatenspeicherung</a>

Then Pandoc will convert to a proper link

[[/vorratsdatenspeicherung]]

I've implemented this in the fix-linked-images branch of my fork. I've only tested this for a few days with wikipedia articles and it works quite well but I still need to do more tests.

I'll be making a pull request when it's ready.