OliverBalfour / obsidian-pandoc

Pandoc document export plugin for Obsidian (https://obsidian.md)
MIT License
719 stars 60 forks source link

Pandoc has problem to export on pictures in Obsidian: .png :withBinaryFile: does not exist (No such file or directory) #183

Open Hot12345 opened 1 year ago

Hot12345 commented 1 year ago

Hi,

The settings of the plugin are correctly filled in.

I'll get an error message when exporting the note to Word (.docx) (Ctrl + P ) --> ''Pandoc Plugin: Export as Word Document (docx)

Error message: '' Pandoc export failed: pandoc.exe: xxxxxx\Pasted image 20230425122627.png: with binary file: does not exist (No such file or directory) ''

I think it is because the picture files are in a sub-directory, and the plugin does not pass the correct path to Pandoc. With a blanc, normal note and paste the picture and trying to convert the note (.md) to a Word document it works.

But when I put that same .png picture in a subfolder, then it doesn't work anymore... So how to fix this issue?

Hot12345 commented 1 year ago

I think it is because the picture files are in a sub directory, and the plugin does not pass the correct path to pandoc. With a blanc, normal note and paste the picture and trying to convert the note (.md) to Word document it works.

But when I put that same .png picture in a subfolder, then it doesn't work anymore... So how to fix this issue?

qrospars commented 1 year ago

Hi,

I had a similar issue here is the solution I found :) To fix this issue, you can create a Python script that acts as a Pandoc filter to modify the image URLs in your notes to include the correct relative file paths.

Here's a step-by-step guide on how to use the Python script with Pandoc in Obsidian:

  1. Save the following Python script (which we created based on your folder structure) in your Obsidian notes folder (e.g., "Sync/Notes/image_filter.py"). The "Sync" folder is my parent folder for my vault, and all my notes are in the "Notes/" folder.

    import os
    import urllib.parse
    from panflute import Image
    
    # Set a default value for the image directory
    DEFAULT_IMAGE_DIR = "../Files"
    
    def action(elem, doc):
       if isinstance(elem, Image):
           url = elem.url.strip()
    
           if os.path.isabs(url):
               url = '/'.join(url.split(':')[1:]).replace('\\', '/')
    
           url = urllib.parse.unquote(url)
           abs_path = os.path.abspath(os.path.join(os.getcwd(), url.replace('/', '\\')))
           elem.url = os.path.join(os.path.dirname(abs_path), DEFAULT_IMAGE_DIR, os.path.basename(abs_path))
    
           return elem
    
    if __name__ == '__main__':
       from panflute import run_filter
       run_filter(action)
  2. Ensure that you have the Pandoc plugin installed in Obsidian.

  3. In the Pandoc plugin settings, add the following argument to the "Pandoc Arguments" field:

    --filter=image_filter.py

    This tells Pandoc to use the Python script we created as a filter when converting your notes.

  4. Convert your notes using the Pandoc plugin (Ctrl + P > Export as Word Document). The images should now be rendered correctly in the converted documents, even if they are located in a sub-directory.

The filter script ensures that the image paths in your notes are modified to include the correct relative file paths to the sub-directory where the images are stored. This way, Pandoc can locate and render the images properly in the converted documents.

Let me know if that works for you!

Comprehensive-Jason commented 1 year ago

@qrospars I tried using your script. The following error message still appears: image

qrospars commented 1 year ago

Surprising. On my side it works as expected. I am on a windows machine, maybe that's why? @Comprehensive-Jason Are you on a Mac or a Linux?

Comprehensive-Jason commented 1 year ago

I'm on Windows. I believe the problem is with the way I store images in my vault. I store all my notes in a 'Resources' folder, and all my images in a separate 'attachments' folder. I think the script fails because 'attachments' is not a subdirectory of 'Resources'.

KebPericles commented 1 year ago

Thanks for this! @Comprehensive-Jason Neither worked for me until I changed DEFAULT_IMAGE_DIR = "../Files" to DEFAULT_IMAGE_DIR = "Files"

qrospars commented 1 year ago

Hi @KebPericles and @Comprehensive-Jason,

Sorry for the delay. I am happy to hear that it worked for some of you! The DEFAULT_IMAGE_DIR should be the relative path to your image directory from the location of the script. So, for example, I placed the scripting file in my notes directory, so I had to put the "../Files" as a relative path.

@Comprehensive-Jason, try to change this value to "attachments" to see if it works?

Comprehensive-Jason commented 1 year ago

@qrospars After the change, the script works on most of my files. I did get this error when testing some of my longer documents with more images though: image

Also, none of my Latex equations show up in the .docx export.

qrospars commented 1 year ago

This happens because the image is encoded in 64 bit, instead of linking to a file. I didn't think of this case since I never past my images in this format :/

Try changing the code to this (and change the DEFAULT_IMAGE_DIR back to the value that worked for you):

import os
import urllib.parse
from panflute import Image
import re

# Set a default value for the image directory
DEFAULT_IMAGE_DIR = "../Files"

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()

        # Check if the URL is a base64 image
        if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
            # It's a base64 image, leave it as is
            return elem

        if os.path.isabs(url):
            url = '/'.join(url.split(':')[1:]).replace('\\', '/')

        url = urllib.parse.unquote(url)
        abs_path = os.path.abspath(os.path.join(os.getcwd(), url.replace('/', '\\')))
        elem.url = os.path.join(os.path.dirname(abs_path), DEFAULT_IMAGE_DIR, os.path.basename(abs_path))

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)
Comprehensive-Jason commented 1 year ago

@qrospars Thanks so much! I can export images without errors now. Unfortunately, I keep running into persistent issues exporting equations using the plugin. In HTML mode, equations simply don't show up. In Markdown mode, equations often have the wrong formatting. I've decided to switch to the Obsidian Enhancing Export plugin, which does the same thing but is currently being updated and has fewer issues. Can I adapt your script for that plugin?

StrikS commented 1 year ago

For those users who aren't very experienced, I'll add some details: I use relative paths with a subfolder for attachments. So my attachments folder is located at the path:

P:\Sync\Obsidian\Edu\Tech\BD\Practic\attachments

So what I specified:

DEFAULT_IMAGE_DIR = "Edu/Tech/BD/Practic/attachments"

I also failed to specify simply:

--filter=image_filter.py

So I specified:

--filter=P:\Sync\Obsidian\image_filter.py

Of course, you had to install Python. But apart from that, you will need to install the library used in the script:

pip install panflute

After that I managed to get a doc with pictures on the output. @qrospars Thanks for your script, now I can write everything in obsidian with pictures and convert it to the formats I want to export.

Hot12345 commented 1 year ago

For those users who aren't very experienced, I'll add some details: I use relative paths with a subfolder for attachments. So my attachments folder is located at the path:

P:\Sync\Obsidian\Edu\Tech\BD\Practic\attachments

So what I specified:

DEFAULT_IMAGE_DIR = "Edu/Tech/BD/Practic/attachments"

I also failed to specify simply:

--filter=image_filter.py

So I specified:

--filter=P:\Sync\Obsidian\image_filter.py

Of course, you had to install Python. But apart from that, you will need to install the library used in the script:

pip install panflute

After that I managed to get a doc with pictures on the output. @qrospars Thanks for your script, now I can write everything in obsidian with pictures and convert it to the formats I want to export.

Sorry from the late response.

It can make a file docxs. When I make a screenshot with Greenshot and paste it in Obsidian and that the .png file not listed in a directory. This step is crucial, otherwise it cannot find the .png file.

So when the .png file in the root (non folder listed) then I CAN export a .DOCXS file, but when place the .png file in a subfolder (pngfolder) then it gives the Pandoc Export error = failed: pandoc.exe C:\Users\AygulKorkmazlar\Documents\Github\Workvault\Pasted Image 20230xxxxxx.png:WithBinaryfile: does not exist (No such file or directory)

So this is clearly that Pandoc plugin can not find the picture when the picture (.pngfile) in a subfolder. Also I checked in the settings of the plugin Pandoc Plugin no such any kind of settings that it needs to search to a specific image in a nestled folder, instead of ;looking for a picture in the root folder.

Hope this make sense and can fix this bug/issue. Still not being fixed.

What to do to fix this issue that the plugin search below in directory's?

qrospars commented 1 year ago

Hi everyone,

I refactored the script to search for all files inside the Vault instead of a specific folder. This new version should resolve the issues @Hot12345 was having with files in nested directories not being found.

The script now traverses the entire Vault directory to find the images, regardless of where they're located. To make it work for your specific Vault, simply set the VAULT_NAME at the top of the script to match the name of your Vault.

The script uses Python's os module to handle file paths in a platform-independent manner, which means it should work correctly on macOS, Linux, and Windows. However, please note that the VAULT_NAME needs to match exactly, including any uppercase and lowercase letters, for the script to find your files correctly.

Here is the final version of the script:

import os
import re
from panflute import Image

# Set your vault name here
VAULT_NAME = "YourVaultName"

def find_file(filename, search_path):
    """
    Given a filename and a directory search path, this function will return the 
    relative path of the file if it exists within the search path, otherwise None.

    Args:
        filename (str): The name of the file.
        search_path (str): The directory path where the search will be performed.

    Returns:
        str: The relative path of the file if found, else None.
    """
    for root, dirs, files in os.walk(search_path):
        if filename in files:
            return os.path.relpath(os.path.join(root, filename), search_path)
    return None

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()

        # Check if the URL is a base64 image
        if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
            # It's a base64 image, leave it as is
            return elem

        # Get the absolute path to the current directory
        current_dir = os.path.abspath(os.getcwd())

        # Find the index of the vault name in the path
        vault_index = current_dir.index(VAULT_NAME)

        # Construct the vault path
        vault_path = current_dir[:vault_index + len(VAULT_NAME)]

        # Search for the image file in the vault
        image_path = find_file(os.path.basename(url), vault_path)

        if image_path is not None:
            # Replace the URL of the image with the new path
            elem.url = image_path

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)

Please test it out and let me know if there are any issues. As always, make sure to back up your data before testing new scripts. Thanks for your patience and your help in improving this script!

Oskiator commented 1 year ago

Hi qrospars, Many thanks for your work. I just tried the final version of your script. Unfortunately it didn't seems to work for me.

I just get the same pandoc error. image.

I've installed Python and panflute package image image

added the argument in Pandoc plugin image

I've set the vault folder image

I've put the script on the folder : "...\Leader du changement\3. 🗃️ Notes Permanentes" which is one of my main folder (but I use several folders for notes).

I tried to export a note that were in this folder "...\Leader du changement\3. 🗃️ Notes Permanentes", and the image is in the same folder. But it seems that pandoc still want to look in the root folder (Leader du changement), as showed in the error notification.

I would be really happy to have a solution that work if you can help me.

All the best.

P.S Thanks also StrikS for giving more explanations for "users who aren't very experienced". I'm one of them 😁 I even had to learn how to use "pip install panflute"

Hot12345 commented 1 year ago

Hi everyone,

I refactored the script to search for all files inside the Vault instead of a specific folder. This new version should resolve the issues @Hot12345 was having with files in nested directories not being found.

The script now traverses the entire Vault directory to find the images, regardless of where they're located. To make it work for your specific Vault, simply set the VAULT_NAME at the top of the script to match the name of your Vault.

The script uses Python's os module to handle file paths in a platform-independent manner, which means it should work correctly on macOS, Linux, and Windows. However, please note that the VAULT_NAME needs to match exactly, including any uppercase and lowercase letters, for the script to find your files correctly.

Here is the final version of the script:

import os
import re
from panflute import Image

# Set your vault name here
VAULT_NAME = "YourVaultName"

def find_file(filename, search_path):
    """
    Given a filename and a directory search path, this function will return the 
    relative path of the file if it exists within the search path, otherwise None.

    Args:
        filename (str): The name of the file.
        search_path (str): The directory path where the search will be performed.

    Returns:
        str: The relative path of the file if found, else None.
    """
    for root, dirs, files in os.walk(search_path):
        if filename in files:
            return os.path.relpath(os.path.join(root, filename), search_path)
    return None

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()

        # Check if the URL is a base64 image
        if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
            # It's a base64 image, leave it as is
            return elem

        # Get the absolute path to the current directory
        current_dir = os.path.abspath(os.getcwd())

        # Find the index of the vault name in the path
        vault_index = current_dir.index(VAULT_NAME)

        # Construct the vault path
        vault_path = current_dir[:vault_index + len(VAULT_NAME)]

        # Search for the image file in the vault
        image_path = find_file(os.path.basename(url), vault_path)

        if image_path is not None:
            # Replace the URL of the image with the new path
            elem.url = image_path

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)

Please test it out and let me know if there are any issues. As always, make sure to back up your data before testing new scripts. Thanks for your patience and your help in improving this script!

Still not working..... As @Oskiator mentioned, I get the same result. It seems that pictures in a nested folder having a trouble with... And yes I'm also one of them that are not experienced with Python.
But get the same error as @Oskiator mentioned.

shriar commented 1 year ago

If your images are in subfolder relative to the file, you can use this code which is slightly modified. In my case my subfolder name is attachments. Change the sub_image_folder for you case.

import os
import urllib.parse
from panflute import Image
import re

sub_image_folder = 'attachments'
current_dir = os.getcwd()
DEFAULT_IMAGE_DIR = os.path.join(current_dir, sub_image_folder)

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()

        # Check if the URL is a base64 image
        if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
            # It's a base64 image, leave it as is
            return elem

        if os.path.isabs(url):
            url = '/'.join(url.split(':')[1:]).replace('\\', '/')

        url = urllib.parse.unquote(url)
        abs_path = os.path.abspath(os.path.join(os.getcwd(), url.replace('/', '\\')))
        elem.url = os.path.join(os.path.dirname(abs_path), DEFAULT_IMAGE_DIR, os.path.basename(abs_path))

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)
fieri commented 8 months ago

import os import re from panflute import Image

Set your vault name here

VAULT_NAME = "Vault"

def find_file(filename, search_path): """ Given a filename and a directory search path, this function will return the relative path of the file if it exists within the search path, otherwise None.

Args:
    filename (str): The name of the file.
    search_path (str): The directory path where the search will be performed.

Returns:
    str: The relative path of the file if found, else None.
"""
# Find the index of the vault name in the path
vault_index = search_path.index(VAULT_NAME)

# Construct the vault path
vault_path = search_path[:vault_index + len(VAULT_NAME)]

for root, dirs, files in os.walk(vault_path):
    if filename in files:
        return os.path.relpath(os.path.join(root, filename), search_path)
return None

def action(elem, doc): if isinstance(elem, Image): url = elem.url.strip()

    # Check if the URL is a base64 image
    if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
        # It's a base64 image, leave it as is
        return elem

    # Get the absolute path to the current directory
    current_dir = os.path.abspath(os.getcwd())

    # Search for the image file in the vault
    # image_path = find_file(os.path.basename(url), vault_path)
    image_path = find_file(os.path.basename(url), current_dir)

    if image_path is not None:
        # Replace the URL of the image with the new path
        elem.url = image_path

    return elem

if name == 'main': from panflute import run_filter run_filter(action)

mikaeljagelid commented 7 months ago

My Obsidian is setup to have images placed in subfolders under the current folder:

MyDoc.md
attachments/image.png

Since I use a lot of folders in Obsidian and want to use Pandoc wherever the document may live, I need use the full path for the images. To make the filter that @qrospars created in https://github.com/OliverBalfour/obsidian-pandoc/issues/183#issuecomment-1595792931 above to work in my setup I did the following:

  1. I put the filter in my Pandoc default folder at ~/.local/share/pandoc/filters
  2. add --filter image_filter.py to the Obsidian Pandoc Extra Arguments
  3. add this line of code in image_filter.py before elem.url = image_path: image_path = os.path.join(vault_path, image_path)
Hot12345 commented 7 months ago

I gived up on this... there is no solution to the first post.

Hot12345 commented 7 months ago

What I have right now what works is that all the pictures I took and paste in the root folder. So the pictures are not in any sub folders etc.

mikaeljagelid commented 7 months ago

I gived up on this... there is no solution to the first post.

Yeah, I've given it up som many times but I stumbled upon this thread the other day and decided to give it just one more try and finally got it working in my configuration.

I also had problems with errors like with binary file: does not exist (No such file or directory) and got rid of them when I changed the Export Folder in the Obsidian Pandoc Settings to the default "same as target".

BahneGork commented 2 months ago

I found a solution, without the py filter. install plugin link converter in link converter options set "converted link format" to Relative path. then either use the plugin to convert entire vault links to markdown or find your export note, mark all and convert editor selection link to markdown

BenSmithLight commented 2 months ago

大家好,

我重构了脚本,以搜索 Vault 内的所有文件,而不是特定文件夹。此新版本应该可以解决这些问题@Hot12345无法找到嵌套目录中的文件。

脚本现在遍历整个 Vault 目录来查找图像,无论它们位于何处。要使其适用于您的特定 Vault,只需将VAULT_NAME脚本顶部的设置为与您的 Vault 名称匹配即可。

该脚本使用 Python 的os模块以独立于平台的方式处理文件路径,这意味着它应该可以在 macOS、Linux 和 Windows 上正常工作。但是,请注意,需要VAULT_NAME完全匹配,包括任何大写和小写字母,以便脚本正确找到您的文件。

以下是脚本的最终版本:

import os
import re
from panflute import Image

# Set your vault name here
VAULT_NAME = "YourVaultName"

def find_file(filename, search_path):
    """
    Given a filename and a directory search path, this function will return the 
    relative path of the file if it exists within the search path, otherwise None.

    Args:
        filename (str): The name of the file.
        search_path (str): The directory path where the search will be performed.

    Returns:
        str: The relative path of the file if found, else None.
    """
    for root, dirs, files in os.walk(search_path):
        if filename in files:
            return os.path.relpath(os.path.join(root, filename), search_path)
    return None

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()

        # Check if the URL is a base64 image
        if re.match(r'^data:image\/[a-zA-Z]*;base64', url):
            # It's a base64 image, leave it as is
            return elem

        # Get the absolute path to the current directory
        current_dir = os.path.abspath(os.getcwd())

        # Find the index of the vault name in the path
        vault_index = current_dir.index(VAULT_NAME)

        # Construct the vault path
        vault_path = current_dir[:vault_index + len(VAULT_NAME)]

        # Search for the image file in the vault
        image_path = find_file(os.path.basename(url), vault_path)

        if image_path is not None:
            # Replace the URL of the image with the new path
            elem.url = image_path

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)

请测试一下,如果有任何问题请告诉我。与往常一样,在测试新脚本之前请务必备份数据。感谢您的耐心和对改进此脚本的帮助!

Thank you for the update! You think the same way I do. When I first try the 1.0 version, I found that the relative paths can't work in my vault, so I change it into abs_path. Then I found that the abs_path can't work with all folders, so I add a function to make it search all "Attachments" folders, and that works. I share it right here and hope it will help someone else.

By the way, my path structure is:

import os
import urllib.parse
from panflute import Image

# Base directory to start search
BASE_DIR = r"D:\OneDrive\文档\Vault of Obsidian"

def find_image_in_attachments(image_name):
    for root, dirs, files in os.walk(BASE_DIR):
        if 'Attachments' in dirs:
            attachments_dir = os.path.join(root, 'Attachments')
            if image_name in os.listdir(attachments_dir):
                return os.path.join(attachments_dir, image_name)
    return None

def action(elem, doc):
    if isinstance(elem, Image):
        url = elem.url.strip()
        url = urllib.parse.unquote(url)
        image_name = os.path.basename(url)

        image_path = find_image_in_attachments(image_name)
        if image_path:
            elem.url = image_path

        return elem

if __name__ == '__main__':
    from panflute import run_filter
    run_filter(action)