argenos / zotero-mdnotes

A Zotero plugin to export item metadata and notes as markdown files
https://argenos.github.io/zotero-mdnotes/
GNU General Public License v3.0
1.34k stars 81 forks source link

Presence of HTML tags inside extracted annotations (Zotero 6) #185

Closed Klemet closed 1 month ago

Klemet commented 2 years ago

Describe the bug

When extracting annotations from a PDF with Zotero 6's Add note from annotations feature, and exporting the resulting note with the Export to Markdown function of MDNotes, the resulting markdown file contains HTML code, making it difficult to read and edit sometimes.

To Reproduce Steps to reproduce the behavior:

  1. Make annotations in PDF file in Zotero
  2. Export those annotations to a note with Add note from annotations in Zotero
  3. Use Export to Markdown option from MDNotes
  4. Open the resulting file

Expected behavior The resulting markdown file should not contain HTML tags, and should remain in markdown format.

Screenshots image

Desktop (please complete the following information):

argenos commented 2 years ago

That is the way Zotero 6 exports the annotations from their DB. The exports from mdnotes explicitly allow for spans, since they can be used to format exported annotations if you use an external PDF viewer and the Zotfile workflow.

Could you try to modify your note template in Zotero? I'm not sure this will remove the span, but it's worth a try.

Klemet commented 2 years ago

Hello, @argenos !

Thanks a lot for your answer, and for the wonder that is MDnotes in general 😄 !

The reason makes sense, and tweaking the note templates of Zotero looks like a great idea. However, I'm not able to find anywhere where it would be possible to affect the presence of these <span> tags. The only values that can be edited don't mention them:

image

If somebody finds a way to change that, I'll be all in !

kcudding commented 2 years ago

So, trying to understand exactly what happened, and haven't used mdnotes for a while. After Zotero update to 6.0.8 could not figure out how to replicate my previous workflow which produced .md notes like attached photo.

I've updated Better bibtex, zotfile and mdnotes. But, the behaviour is exactly the same as before the update. I can only extract annotations if I use the Zotero pdf viewer. Those annotations produce the html mess the same as the commentor above

So... does that mean the note extraction functionality is entirely gone now, unless someone finds a way to modify the templates in a useful way?? Screen Shot 2022-05-30 at 5 47 07 PM

Mac OS Zotero 6.0.8 Mdnote 0.2.3 Zotfile 5.1.1 Better Bitex 6.7.1

baroneUnmetNeedsAnalyzing2017 - Extracted Annotations (2021-05-02, 92555 a.m.)The biologists in this study see training as the most important factor .md ?

kcudding commented 2 years ago

Okay, after hard reboot, I CAN use an external pdf view to produce an .md of annotations from mdnotes. However the file does not contain zotero links (and annoyingly has two Annotations headers). Generated by using mdnotes on the annotations folder. This file is correctly named. Screen Shot 2022-05-30 at 6 08 47 PM

Or I can use the native Zotero export to .md to produce a note file from the annotations folder that contains links but which is incorrectly named.

Is this still behaviour as expected, with possible resolution related to a template modification ?

Screen Shot 2022-05-30 at 6 12 24 PM

so I guess this is why people are talking about the citations add-on? but can it be resolved just be using mdnotes

argenos commented 2 years ago

@kcudding I recommend switching to Zotero Integrator if this is an issue. As I mentioned above, it's unlikely I will address the issue any time soon, since it comes from the way Zotero exports their internal annotations to HTML. If you wish, you can play around with the templates used by Zotero 6 to do that export, or open a PR to fix this.

cjpoor commented 2 years ago

It is possible to strip the html tags from text. This will remove the link to the page of the pdf in Zotero. A link to the pdf can be added back in using Zutilo "copy select item links."

However it is quicker to export the note using both Zotero export and mdnotes, then copy the tags, related, and anything else you want from the mdnotes exported file into the Zotero exported file. You can also copy tags to the clipboard using Zutilo.

If you use the unique ID method to make links between notes e.g: Note ID: 20220601115332 Related: [[20220601112158]] then the link in square brackets is recognised by Zettelkasten apps like Zettlr.

If someone with coding knowledge can think of a way of automating any or all of this I will buy them a :beer:

huyz commented 2 years ago

I'm a first-time user to Zotero and mdnotes, trying to come up with a workflow so that my annotations import well into Obsidian. I'm naturally using Zotero 6 since I'm new.

Right now, it looks like using Zotero 6's Add Note from Annotations, then selecting the note, and doing Export Note with Include Zotero Links end up with a pretty good result, except you lose all the colors.

So what is OP trying to do that built-in Zotero 6 functionality doesn't give you? What's missing?

Klemet commented 2 years ago

I'm a first-time user to Zotero and mdnotes, trying to come up with a workflow so that my annotations import well into Obsidian. I'm naturally using Zotero 6 since I'm new.

Right now, it looks like using Zotero 6's Add Note from Annotations, then selecting the note, and doing Export Note with Include Zotero Links end up with a pretty good result, except you lose all the colors.

So what is OP trying to do that built-in Zotero 6 functionality doesn't give you? What's missing?

Nice catch, @huyz ! Indeed, it seems that the problem is not present when using the built-in Export Note function of Zotero.

To answer your question, it's just a question of practicality; Export Note of Zotero functions well, but mdnotes has a lot of customization functions that makes everything quicker (like properly naming the file you're exporting, giving you the right folder to export by default, etc.). I agree that it's just a matter of convenience, and not a necessity.

kcudding commented 2 years ago

The deal killer for me is that while I can use the native Zotero export to .md to produce a note file from the annotations folder that contains links but which is incorrectly named." You end up with a file labelled Annotations, which overwrites the previous file unless you have renamed it, and which is not identifiable. That's not just a convenience issue unless you only use the feature occasionally.

I and others have raised the file name issue with Zotero, but no motion yet.

huyz commented 2 years ago

You end up with a file labelled Annotations, which overwrites the previous file unless you have renamed it

One workaround is to change the Annotations text at the top of the note and paste in a more descriptive name (e.g., copied from the item title). Only then, if you do an Export note... then you get a unique, and more descriptive name.

kcudding commented 2 years ago

mdnotes renames to the better bibtex reference key automatically, which I found invaluable for organization. Doing it manually you have to get the key (you can set up for command c - shift to do this) and then paste in. So instead of processing annotations to a .md with one click as before, its now 4 actions to get the same job done.

Unless someone has new ideas about how to automate? As far as I know there is still no option to automatically set the title field in the Zotero generated annotation note.

gjimenezUCM commented 2 years ago

@kcudding I recommend switching to Zotero Integrator if this is an issue. As I mentioned above, it's unlikely I will address the issue any time soon, since it comes from the way Zotero exports their internal annotations to HTML. If you wish, you can play around with the templates used by Zotero 6 to do that export, or open a PR to fix this.

I think I found a solution for this issue: span.highlight element (span element whose class is highlight) represents an annotation in Zotero and the attribute data-annotation has information about the attachment, the annotation and its position in the attachment. As an example, if we decode the component contained in data-annotation, we can create the URI transforming:

{
    "attachmentURI":"http://zotero.org/users/8528213/items/L3YWES9Q",
    "annotationKey":"AT3Q9HXT",
    "color":"#ffff00",
    "pageLabel":"3",
    "position":{
        "pageIndex":21,
        "rects":[[51.744,124.472,391.283,136.526],[51.744,111.522,391.283,123.576],[51.744,98.572,391.283,110.626],[51.744,85.622,115.667,97.676]]},
        "citationItem":{
            "uris":["http://zotero.org/users/8528213/items/5P58CWR2"],
            "locator":"3"
        }
    }
}

into

zotero://open-pdf/library/items/L3YWES9Q?page=22&annotation=AT3Q9HXT

This url can be used to create a link using the content in the span.citation element. To do that, you can add a new rule in the getConverter function in markdown-utils.js. Something like this:

converter.addRule('annotation-link', {
    filter: function (node, options) {
        // Only works with span.citation elements
        return (
          node.nodeName === 'SPAN' &&
          node.getAttribute('class') === 'citation'
        );
    },
    replacement: function (content, node) {
        // Access to the span.highlight element (.citation sibling)
        let sibling = node.previousElementSibling;
        let newContent = content;      // By default
        // Sanity check
        if (sibling && sibling.getAttribute('class') === 'highlight') {
            // data-annotation to object
            let data = JSON.parse(decodeURIComponent(sibling.getAttribute('data-annotation'))); 

            // Extract the attachment (item) key (the last element in the URL)
            let itemKey =  data.attachmentURI.split("/").at(-1);
            let page = data.position.pageIndex+1;        // Is it necessary?
            let url =  `zotero://open-pdf/library/items/${itemKey}?page=${page}&annotation=${data.annotationKey}`;
            newContent =  `[${content}](${url})`;  
        }
        return newContent;        
    }
  });

I have not tried it in zotero-mdnotes yet but I have tested it.