Kovah / LinkAce

LinkAce is a self-hosted archive to collect links of your favorite websites.
https://www.linkace.org
GNU General Public License v3.0
2.65k stars 164 forks source link

Import: Tags get messed up #578

Open piegamesde opened 1 year ago

piegamesde commented 1 year ago

Bug Description

I just tried to import a bookmarks file and it went fairly well, it even recognized the tags properly. However there seems to be a bug: some links got tags that were not present in the HTML file. For example, the bookmarks tag got added to almost all or all bookmarks. I did a few checks and could not find any missing tags, only added ones.

How to reproduce

TODO. I'll try to create a small reproducer file soon

Expected behavior

Tags are properly imported. (Alternatively, tags are not imported at all.)

Logs

No response

Screenshots

No response

LinkAce version

v10.5

Setup Method

PHP

Operating System

Linux (Ubuntu, CentOS,...)

Client details

Arch Linux, Chromium

Kovah commented 1 year ago

Unfortunately, tags are a real mess. Browser vendors do their own thing, so tags are not really consistent. For example, nested folders are sometimes specified as multiple tags and so on. Sharing a sample file will definitely help.

Also see

piegamesde commented 1 year ago

Here a snippet of my bookmarks that I can share

<!DOCTYPE NETSCAPE-Bookmark-file-1>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>

<DL><p>
    <DT><H3 ADD_DATE="1669307552" LAST_MODIFIED="1669307552" PERSONAL_TOOLBAR_FOLDER="true">buku bookmarks</H3>
    <DL><p>
        <DT><A HREF="https://github.com/adacta-io/adacta" ADD_DATE="1669307552" LAST_MODIFIED="1669307552" TAGS="github,archive,archiving">Personal Document Archiving</A>
        <DT><A HREF="https://archivebox.io/" ADD_DATE="1669307552" LAST_MODIFIED="1669307552" TAGS="archive,archiving">ArchiveBox</A>
        <DD>🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more…
        <DT><A HREF="https://github.com/cnwangjie/better-onetab" ADD_DATE="1669307552" LAST_MODIFIED="1669307552" TAGS="onetab,tabs,bookmarks,archive,archiving">cnwangjie/better-onetab</A>
        <DD>A better OneTab for Chrome Temporarily removed from firefox without maintaining in a period & any cooperative purpose are welcome
        <DT><A HREF="https://github.com/awesome-selfhosted/awesome-selfhosted#bookmarks-and-link-sharing" ADD_DATE="1669307552" LAST_MODIFIED="1669307552" TAGS="awesome,bookmarks,links,archiving">awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted on your own servers</A>
        <DD>Includes link management list
        <DT><A HREF="https://github.com/jarun/buku" ADD_DATE="1669307552" LAST_MODIFIED="1669307552" TAGS="bookmarks,links,archiving,buku">buku: Personal mini-web in text</A>
        <DD>Simple & beautiful bookmark manager
    </DL><p>
</DL><p>

I'll have to check whether they indeed reproduce the issue later, and how deterministic it is in the first place.