JessicaTegner / pypandoc

Thin wrapper for "pandoc" (MIT)
http://pypi.python.org/pypi/pypandoc/
Other
843 stars 108 forks source link

Ignoring Alt Text when convert from docx to txt #364

Open caphefalumi opened 2 months ago

caphefalumi commented 2 months ago

Currently, when I convert from docx to txt, the alt text of images is retrieved along with the paragraphs as something like "[ALT TEXT]", how do I exclude alt text? Here is my code pypandoc.convert_file(docx_path, 'plain', extra_args=['--wrap=none'], outputfile='output.txt')

JessicaTegner commented 2 months ago

From the pandoc user guide:

A link immediately preceded by a ! will be treated as an image. The link text will be used as the image’s alt text:
![la lune](lalune.jpg "Voyage to the moon")

![movie reel]

[movie reel]: movie.gif
Extension: implicit_figures
An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.
![This is the caption](/url/of/image.png)
[...]
If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:
![This image won't be a figure](/url/of/image.png)\