Release-Candidate / Obs2Org

Converts Obsidian-style markdown files to Org-Mode files using pandoc. Converts the internal Wiki-links to valid Org-Mode links.
GNU General Public License v3.0
8 stars 0 forks source link

Link conversion not working for me #1

Closed tonygrr closed 1 year ago

tonygrr commented 1 year ago

For some reason the main function of this tool is not working for me. After converting I get regular wiki-links [[note]] instead of org-links [[file:note][note]].

But, it is worth noting that the rest of the transformations are just fine.

Release-Candidate commented 1 year ago

Thanks for your bug report. And for letting me know that somebody is still using this program.

I've just noticed that my tests fail too now, some update has broken the link correction.

Release-Candidate commented 1 year ago

No, that problem is just that Pandoc has changed some whitespace in the output.

Do I understand you correctly, that you have a Markdown link like [[Headline]] that links to a file Headline.md with the title Headline (Org-Mode [[file:Headline.org::#headline][Headline]])? Because as of now, I'm only converting Markdown links like [[note#Some Title]] to Org-Mode [[file:note.org::#some-title][Some Title]]. I haven't thought about these links, as I don't use them myself, but I'm adding them.

If this is not the problem you are experiencing, please post an example of a Markdown link that fails to be converted.

Release-Candidate commented 1 year ago

Could you please check if the newest version 1.1.0 works for you?

(sudo) pip install obs2org -U

tonygrr commented 1 year ago

Do I understand you correctly, that you have a Markdown link like...

Yes, I did not immediately pay attention to the conversion logic. Most of my base followed a cleaner syntax with links solely for files, and I expected them to be converted to the org-mode link format as well. All links like [[note#title]] in my database were translated correctly.

tonygrr commented 1 year ago

Could you please check if the newest version 1.1.0 works for you?

(sudo) pip install obs2org -U

In this version, partial links to notes have been converted to the org-mode format. However, for some reason this happened with a small part of the files. After completing the conversion process on the command line, I got the following:

Correcting links, tags, ... in file '000 Home.org'
Error, linked file 'C:\Users\Mojo\Downloads\test2\Inbox.org' has not been found, link to section 'Inbox' won't work!
Error: heading 010 Radio Engineering not found in file C:\Users\Mojo\Downloads\test2\010 Radio Engineering.org
Error, linked file 'C:\Users\Mojo\Downloads\test2\020 Завод.org' has not been found, link to section '020 Завод' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\030 Личное.org' has not been found, link to section '030 Личное' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\040 Проекты.org' has not been found, link to section '040 Проекты' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\050 Люди.org' has not been found, link to section '050 Люди' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\060 Музыка.org' has not been found, link to section '060 Музыка' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\070 Литература.org' has not been found, link to section '070 Литература' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\080 Знания.org' has not been found, link to section '080 Знания' won't work!
OK

Correcting links, tags, ... in file '010 Radio Engineering.org'
Error, linked file 'C:\Users\Mojo\Downloads\test2\Математические методы прикладной электродинамики.org' has not been found, link to section 'Математические методы прикладной электродинамики' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\Классическая электродинамика.org' has not been found, link to section 'Классическая электродинамика' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\Радиотехнические системы передачи информации.org' has not been found, link to section 'Радиотехнические системы передачи информации' won't work!
Error: heading Электропреобразовательные устройства (ЭПУ) not found in file C:\Users\Mojo\Downloads\test2\Электропреобразовательные устройства (ЭПУ).org
Error, linked file 'C:\Users\Mojo\Downloads\test2\Номер студенческого билета.org' has not been found, link to section 'Номер студенческого билета' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\Курс --- 4, семестр --- 2.org' has not been found, link to section 'Курс --- 4, семестр --- 2' won't work!
Error, linked file 'C:\Users\Mojo\Downloads\test2\Курс --- 4, семестр --- 1.org' has not been found, link to section 'Курс --- 4, семестр --- 1' won't work!
OK

Correcting links, tags, ... in file '090 Spaced Repetition Flashcards.org'
OK

Correcting links, tags, ... in file 'Anki РПрУ.org'
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\__main__.py", line 34, in <module>
    run(main.main())
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\main.py", line 77, in main
    await _convert_files(cmd_line_args=cmd_line_args, cmd_line_parser=cmd_line_parser)
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\main.py", line 194, in _convert_files
    await _do_convert_files(pandoc_path=pandoc_path, list_of_files=list_of_files)
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\main.py", line 396, in _do_convert_files
    correct_org_mode(correct_file.out_file)
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\convert.py", line 116, in correct_org_mode
    new_text = correct_org_mode_file(file_text, file_path.parent)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\parse_org_mode.py", line 76, in correct_org_mode_file
    corrected_links = _correct_org_mode_links(text=corrected_dates, directory=directory)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\parse_org_mode.py", line 174, in _correct_org_mode_links
    return _internal_wikilink_regexp_file_only.sub(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\parse_org_mode.py", line 175, in <lambda>
    repl=lambda match_obj: _link_replace_func(
                           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\parse_org_mode.py", line 237, in _link_replace_func
    header_link, heading_name = _parse_linkedfile(
                                ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\site-packages\obs2org\parse_org_mode.py", line 276, in _parse_linkedfile
    with file_name.open(mode="r", encoding="utf-8") as f_d:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mojo\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument: 'Pasted image 20210117153748.png|400.org'

I'm not very good at this, however, perhaps the last error prevented the process of correcting links in other files from continuing.

tonygrr commented 1 year ago

@Release-Candidate i think i've found a pattern. Link correction works fine until a link like [[Note|notes]] is encountered. After that, the link correction stops. I deleted the file that contained [[Pasted image 20210117153748.png|400]] and re-converted. Now I got the following error: OSError: [Errno 22] Invalid argument: 'Полевой транзистор|ПТ.org'. At the same time, now I got more files with correct links of the org-mode format (these are the files that were located above the note, where the syntax [[Полевой транзистор|ПТ]] was encountered).

Release-Candidate commented 1 year ago

Thanks for the details and your help. Now the program is interpreting the filename and the display name as filename, and of course there is no file with the name Pasted image 20210117153748.png|400.org or Полевой транзистор|ПТ.org. Would you mind giving me a copy of the files 000 Home.md, 010 Radio Engineering.md and Anki РПрУ.md? Preferable by email - the address is in my profile - or, if you really don't mind, publicly uploading in this issue.

tonygrr commented 1 year ago

@Release-Candidate, i will get to my computer tonight and send you those files (sorry for the delay).

I would like to ask you if there is a way not to stop correcting references if the program encounters similar syntax? Is there an option to skip such files and continue correcting links?

Release-Candidate commented 1 year ago

Thank you, that's fast enough, I need the files to test the program with 'real data' after I'd have fixed the errors and tested with some example links on my own. Just so you have a bit less work testing the program.

The intended behavior of the program is to keep on converting links even when encountering errors. The stanza

Correcting links, tags, ... in file 'Anki РПрУ.org'
Traceback (most recent call last):
...
OSError: [Errno 22] Invalid argument: 'Pasted image 20210117153748.png|400.org'

means that an error has occurred which I didn't catch.

Let me explain what the program does in detail:

tonygrr commented 1 year ago

@Release-Candidate I sent you a test vault that I tried to convert.

Let me explain what the program does in detail:

Thank you for the explanation. Let me ask you one detail. If I understand correctly, one of the tools your migration program uses is regexp. Can you tell me if it is possible to add a fix for links using [[Note|alias]] syntax? For example, to use a regular expression to remove everything after |, that is, to turn a [[Note|alias]] link into a [[Note]] link and then fix it for org-mode format.

The point is that in my native Russian, the declension of words is quite complex, so I often had to use this syntax to make references within the context.

For example, to write a word in the plural (for example, the word "tree"), in English it is enough to add one letter ("treeS"), but in Russian the singular looks like "деревО" and the plural looks like ".деревЬЯ". Therefore, I assume that a significant part of the links in my knowledge base will be lost in the conversion.

I'd rather get working links with wrong word declension during migration than invalid links.

tonygrr commented 1 year ago

The ideal would be to convert [[Note|alias]] to [[file:Note.org][alias]]. This would then be an ideal converter for converting from Obsidian/Roam to Emacs, allowing 100% of the knowledge base to be converted. At least for me. I don't know how difficult that would be to implement.

Release-Candidate commented 1 year ago

to convert [[Note|alias]] to [[file: Note.org][alias]] That is my plan too. I just need to be careful with Hyperlinks like [[http://www.url.com|Caption]] and [[some_file.pdf|Caption]].

I would need to know what to do with [[cite:@Name]] links and what with these [[#^8587c9|(1)]]

But the problem I am facing is that on my MacOS the Cyrillic filenames are messed up because of a wrong encoding, so not a single link is working on my machine. The text is perfectly fine (well, it is UTF-8), but the filenames ... The original files: Bildschirm­foto 2023-03-12 um 20 12 47 You are on a Windows machine, right?

Release-Candidate commented 1 year ago

Ok, got it working now. The problem has been MacOS' zip program. With another one (The Unarchiver) the filenames are fine now.

tonygrr commented 1 year ago

That is my plan too.

I'm very happy about that.

[[cite:@Name]]

Sorry, I may have missed or forgotten something, what are these links?

[[#^8587c9|(1)]]

I almost never used block references because it is a very specific syntax that works exclusively in Obsidian. I think it is impossible to convert them correctly, because in org-mode (and in org-roam) you cannot reference a paragraph of text.

I may have experimented with references in this format somewhere (I think for formula numbering, since Obsidian, which uses mathjax, had problems with it), but I don't think most people in their right mind have not used this syntax.

As for links like [[Note#^abcdef]] or [[Note#^abcdef|alias]] I think the best solution would be to convert them to [[Note]]. If it's possible.

You are on a Windows machine, right?

Yeah.

Release-Candidate commented 1 year ago

Pandoc converts links like [[@StrukturaProgrammRavesli2016]] to [[cite:@StrukturaProgrammRavesli2016]] file: test-vault/+c Function.md

Talking about these kind of links: [[Расчёт и конструирование РПрУ, Екимов.pdf#page=217|Екимова, стр. 215]] is being converted to [[Расчёт и конструирование РПрУ, Екимов.pdf#page=217][Екимова, стр. 215]] does that work for you? My Emacs does not like that sort of links to open PDF on a specific page. I have found a link of the kind [[pdfview:Расчёт и конструирование РПрУ, Екимов.pdf::217][Екимова, стр. 215]] using org-pdfview. My question is if I can do anything sensible with the page number #page=N or just throw it away.

tonygrr commented 1 year ago

Pandoc converts links like [[@StrukturaProgrammRavesli2016]] to [[cite:@StrukturaProgrammRavesli2016]] file: test-vault/+c Function.md

That's the conversion I would have preferred.

About the links to the pages in the pdf - I think they can be thrown out. This syntax was also obsidian-specific. I gave it up pretty quickly and started using zotero.

tonygrr commented 1 year ago

@Release-Candidate let me ask you one more detail. Is it possible to add a property block for a file as a whole based on its name? I'll clarify it with an example. Suppose I have a note "000 Home.md" with the following content:

---
title: 000 Home
date created: 2021.10.27, 20:39
date modified: 2022.10.06, 10:04
aliases: []
tags: [moc]
---

# 000 Home

- [[Inbox]]

Is it possible to convert it to the following format (000 Home.org):

:PROPERTIES:
:ID: home
:END:
#+title: 000 Home

* 000 Home
:PROPERTIES:
:CUSTOM_ID: home
:END:
- [[file:Inbox.org][Inbox]]

I think most people who want to use your knowledge base conversion program plan to use the org-roam v2 package in Emacs, which is one of the best implementations for zettelcasten note taking. In order for the converted notes to work with org-roam, you must have a property ID (e.g., for the note to appear in search (C-c n f)). I think this would be a great solution for converting from Obsidian/Roam Research to Org Roam using your tool. I assume that if there are no notes with the same name in the knowledge base (which is not possible when using Obsidian), this solution should not lead to conflicts. Even if you later create a file with the same name, org-roam will generate an identifier likeeb8e241b-b54c-474e-83b9-090470ec2fae for it, which will be very different from the home identifier.

Release-Candidate commented 1 year ago

Adding that header to every file should be quite easy. But changing the id (filename) from 000-home to home is not possible because the shortened name could lead to name collisions. Using 000-home as id should work. Which is the filename in lowercase with spaces/whitespace changed to hyphens.

tonygrr commented 1 year ago

But changing the id (filename) from 000-home to home

Yes, thank you for that clarification. I think that's correct. I simply converted the file "000 Home.md" which contained the header # 000 Home for which the :CUSTOM_ID: home was created. I simply took that identificator from under that header, replacing :CUSTOM_ID: with :ID:. I don't know if it worked correctly, because I also expected :CUSTOM_ID: 000-home to be created for that header.

Release-Candidate commented 1 year ago

The new version works for me with your test files. Could you please check if the newest version 1.2.0 works for you?

(sudo) pip install obs2org -U

If you fix the two files that Pandoc can't convert like I wrote in the email, there should not be any errors caused by the conversion any more. The 6 six errors left are 'real' errors in the files, the rest is missing files and block references.

If the conversion works for you, I can add a command line switch to add these headers to all files.

Release-Candidate commented 1 year ago

Aside from using the filename as id, I could also always generate a UUID (a unique id) which looks like this: 16fd2706-8baf-433b-82eb-8c7fada847da, like Org-Roam does.

tonygrr commented 1 year ago

I tested the new version on a more current copy of my data and it works fine for me. All of the links I clicked were working! This is awesome!

If the conversion works for you, I can add a command line switch to add these headers to all files.

Do you mean adding properties for each file with a :ID:?

Aside from using the filename as id, I could also always generate a UUID (a unique id) which looks like this: 16fd2706-8baf-433b-82eb-8c7fada847da, like Org-Roam does.

I think it would be a great idea to maintain the overall style. It seems to me that when org-roam reindexes the new indentificators, it will not create a reidentifier (in fact, in terms of probability theory, the chance of this scenario is very small).

Release-Candidate commented 1 year ago

I tested the new version on a more current copy of my data and it works fine for me. All of the links I clicked were working! This is awesome!

Great to hear!

If the conversion works for you, I can add a command line switch to add these headers to all files.

Do you mean adding properties for each file with a :ID:?

Yes, exactly.

tonygrr commented 1 year ago

The only thing is that I may have misunderstood a bit when we discussed links like [[@DiagrammaVolpertaSmita2022]] that convert to links like [[cite:@DiagrammaVolpertaSmita2022]]. I thought it was about the cite prefix in the name. Let me clarify. In my knowledge base, these are regular files. This must have been my mistake, since I didn't add a folder with these notes to the archive for testing. They contain links to go to zotero and to the web page. An example of this note:

---
title: @DiagrammaVolpertaSmita2022
date-created: 2023.01.07, 12:10
tags: [zotero]
authors: 
year: 2022
---

- Zotero-URI:: [Zotero](zotero://select/items/@DiagrammaVolpertaSmita2022)
- URL:: [Web](https://ru.wikipedia.org/w/index.php?title=%D0%94%D0%B8%D0%B0%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B0_%D0%92%D0%BE%D0%BB%D1%8C%D0%BF%D0%B5%D1%80%D1%82%D0%B0_%E2%80%94_%D0%A1%D0%BC%D0%B8%D1%82%D0%B0&oldid=121062971)

---

# Диаграмма Вольперта — Смита

Is it possible to correct these links in the same way as the others?

I haven't yet explored org-mode's citation capabilities, so maybe I'm missing something. However, I would be happy if the links to these notes were retained, as they allow you to quickly go to the source of the information.

Release-Candidate commented 1 year ago

The problem is that Pandoc converts these links to citations. Because the @ is Pandoc markdown syntax for a citation. But I could of course add an option to remove the cite: prefix and treat them as normal links.

tonygrr commented 1 year ago

Thanks for the clarification. Perhaps, in that case, I myself need to remove the "@" prefix in these links using mass substitution, so that other people who have used this syntax as intended won't encounter incorrect conversions (if I understand it correctly).

Release-Candidate commented 1 year ago

Thanks for the clarification. Perhaps, in that case, I myself need to remove the "@" prefix in these links using mass substitution, so that other people who have used this syntax as intended won't encounter incorrect conversions (if I understand it correctly).

No, that means that I cannot do this by default, as people would like to use that feature, but I make that an option that can be activated by an command line flag (like --no-cite).

tonygrr commented 1 year ago

In that case, I would be glad to have this option.

Release-Candidate commented 1 year ago

The newest version does that all now.

To update:

(sudo) pip install obs2org -U

Command line arguments to get UUIDs and no Pandoc citations: add -n (or --no-cite) and -u (or --uuid

python -m obs2org -n -u ...
tonygrr commented 1 year ago

That's just great! It works perfectly for me. Thank you so much for your work, it's great <3. I converted the entire knowledge base, which I can now work with in Emacs.

Release-Candidate commented 1 year ago

Great to hear!

And thank you, you helped getting the program feature-complete. Without you and your test documents that would not have happened that fast or maybe at all!

tonygrr commented 1 year ago

I'm glad to be a part of it!

In the Russian-speaking part of Telegram, there are quite large communities on the topic of taking notes using the Zettelkasten method and on the topic of Obsidian. These communities often talk about Emacs and Org Roam in particular. People share their experiences. And, more than once I've met people who would like to try Org Roam instead of Obsidian. And even after having gone through the first circle of hell, consisting of figuring out Emacs and doing its initial setup, many have run into the second circle of hell, consisting of the inability to convert notes from markdown format to org format. This is no problem now! I would happily recommend your tool to everyone to solve this problem.

Release-Candidate commented 1 year ago

Thank you for your kind words, I'm flattered.

Btw. I've also made a browser extension for Chrome, Edge and Firefox to save the current link and a description in Org-Mode format Notoy Browser Extensions. And a PWA (that's a installable web-app) that can be used on smartphones (and on desktops, but that's not really useful) Notoy PWA

tonygrr commented 1 year ago

Thank you, I'll look into it. I think it could be useful, given that there are some difficulties with working with .org files on mobile devices.