atlas-engineer / nyxt

Nyxt - the hacker's browser.
https://nyxt-browser.com/
9.87k stars 413 forks source link

Search engine dynamic suggestions (alias "webjumps") #79

Closed Ambrevar closed 3 years ago

Ambrevar commented 6 years ago

I think it's important to discuss the bookmark format at an early stage.

  1. A binary format is probably not a good idea. It's not versionable and you are going to lose a huge user base who likes to keep their bookmarks in a git repo. A few years back, another power-browser transited from text format to SQLite, it went defunct in the few following months... (I think it was Luakit.) Also see https://github.com/qutebrowser/qutebrowser/issues/882#issuecomment-136807011.

  2. What do you think about using Emacs eww's format? That would allow the two to share the same bookmarks transparently, a big bonus for Emacs users I think.

  3. Qutebrowser is currently discussing the merge of bookmarks, quickmarks and search engines. I think it's a neat idea. Basically every bookmark can be recalled with a simple keybind (if specified) or used as a search engine (again, if specified). See: https://github.com/qutebrowser/qutebrowser/issues/882.

jmercouris commented 6 years ago

The area of bookmarks is a huge area of contention for me. Currently the bookmarks are already in a sqlite database as is the history stored in ~/.local/share/next/bookmark.db and ~/.local/share/next/history.db respectively. The schema is very simple, just a single column for URL.

The reason I would like to stick with sqlite is because I am going to add some "smart" word matching, text analysis, and document tagging capabilities in the future (as that is one of my specialties as a computer scientist).

In terms of importing from other browsers and formats, something to do that would be greatly appreciated. If you are interested on working on a parser for EWW bookmarks in Common Lisp, or some mechanism to add them to, that would be great!

In regards to bullet 3: I have no idea what quick marks are, I looked through the thread and I wasn't able to really figure it out. If you could provide a brief explanation, that would be very useful.

Finally, it is currently really easy to combine multiple sources to do something like search history, bookmarks, current tabs etc. So implementing these kinds of "go-to-anything" functions should be relatively trivial :)

Ambrevar commented 6 years ago

I understand the need for a database, and Qutebrowser does that too. The thing is, you don't need to store the bookmarks as a SQL database, you can simply dump them as plain text on save, and create the database on startup.

This is a must, and I insist: bookmarks significantly lose their value if their are not human readable / versionable. (That would personally deter me from using Next Browser altogether.)

Qutebrowser has opted for storing the history as a SQL binary though, because nobody was really interested in versioning it. This could still be left as an option since it's a trivial setting after the job has been done for bookmarks.

EWW: no need for a parser, the bookmarks are directly stored in plain Lisp! :)

((:url "https://www.example.org" :title "Example" :time "Mon Feb 12 12:43:29 2018")
 (:url "https://itch.io/jam/lisp-game-jam-2018" :title "Lisp Game Jam 2018 - itch.io" :time "Mon Feb 12 11:40:25 2018"))

Quickmarks: Many power-browsers feature those special bookmarks which can be summoned via a simple keychord. The point taken by the linked discussion is that quickmarks are intrinsically just bookmarks with a binding, so why not adding the key directly to the bookmark structure?

Same for search engines.

Bookmarks also need tags, a lot of users like that. It's a good thing, I think, if your bookmark structure can accept arbitrary fields (and keep them), even if Next Browser does not use them.

In the end, the structure (and the file!) could look like this:

((:url "https://www.example.org" :title "Example" :time "Mon Feb 12 12:43:29 2018" :tags (web dev foo) :key "e")
 (:url "https://duckduckgo.com/" :title "DuckDuckGo" :time "Mon Feb 12 11:40:25 2018" :search "https://duckduckgo.com/?q=%s" :my-random-tag foo-bar))
Ambrevar commented 6 years ago

The qutebrowser discussion is a bit long I reckon, but definitely worth a skim!

j3ky commented 6 years ago

what about a bookmarks.json file as Falkon does?

Anyway, Ive seen somewhere a python script that does convert bookmark.html to bookmark.db...

Could not find it anymore!

Ambrevar commented 6 years ago

JSON is obviously less lispy (JSON was initially made for non-Lisp languages -- Lisp implementations are only there for communicating with other languages).

That said, if we have programmable import/export functions, then everything is possible: the user would be free to store their bookmarks as JSON.

I'd advocate for a Lisp format by default though.

Why did you mention .html to .db conversion?

j3ky commented 6 years ago

In terms of importing from other browsers and formats, something to do that would be greatly appreciated.

Why did you mention .html to .db conversion?

Ambrevar commented 6 years ago

By bookmark.html, you meant bookmarks from Firefox or Chrome? Sure, that would be useful. We could probably provide it directly as an importer function. For instance, if bookmark.html is found, Next would automatically important the bookmarks.

j3ky commented 6 years ago

Aint it!

I hope that Next becomes a real thing!

I mean, Firefox is great and all, but I want a real GNU Emacs experience, an extensible browser!

4t0m commented 5 years ago

Alright, I'm interested in working on improving bookmarks. Is there a roadmap for what features to add next?

The thing that sticks out for me is that lack of folders, but maybe there are other issues that are more pressing/exciting?

jmercouris commented 5 years ago

The first thing to do would be to remove the dependency on Sqlite and save SEXP to files containing a list of the bookmarks. That would be the first step in a journey :D

Perhaps you could use an alist? Or some lists of lists to simulate folders?

On Aug 21, 2019, at 19:17, 4t0m notifications@github.com wrote:

Alright, I'm interested in working on improving bookmarks. Is there a roadmap for what features to add next?

The thing that sticks out for me is that lack of folders, but maybe there are other issues that are more pressing/exciting?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Ambrevar commented 5 years ago

I've added some bullet points of what needs to be done to improve bookmarks:

https://github.com/atlas-engineer/next/blob/master/documents/CHANGELOG.org#extend-bookmark-support

I think folders are not the right approach though: folders don't compose, for instance if you have two folders "foo" and "bar", what about that bookmark that belongs to both?

I believe a better approach is to use tags: users can add arbitrary tags to any bookmark and then filter bookmarks by tags. For instance, the user can query something like "all bookmarks that have tags foo1 and foo2 but not bar123".

What do you think?

Thanks for looking into this, don't hesitate to ask a lot of questions :) I'm excited to see overpowered bookmarks come to Next!

ajgrf commented 5 years ago

I currently manage my bookmarks in an Org document, so I hope Next settles on something flexible enough to integrate with that.

Ambrevar commented 5 years ago

No worries about that!

I plan to add import/export hooks so that the user can customize how the bookmarks are saved / loaded.

There is the cl-org library which can parse Org. It might need a bit of work but it's definitely doable.

cbbrowne commented 5 years ago

I'd see the matters of the model for representation of bookmarks and the physical representation of that as being two very distinct things. Shifting away from SQLite isn't necessarily an improvement, particularly if the model does not deeply improve. I don't see why a good model couldn't be implemented atop SQLite perfectly well.

There are some pretty useful notions in the things mentioned thus far...

Ambrevar commented 5 years ago

You are right, the physical representation is distinct, which is what I intend to implement: the persistence of bookmarks is done via arbitrary export/import functions which can be configured by the user.

Indeed, we are not really "shifting away from SQLite", we simply won't be using it by default.

Besides, now the SQLite queries are tangled withing the fuzzy-search feature which is very annoying, so I want to separate both.

  • The ability to organize bookmarks in a composable way would be a good thing. Using tags to do this is a pretty good idea, and that is what I have been shifting towards with my Firefox instances. Nevertheless, I'm finding it somewhat painful to get bookmarks tagged; it's a painful amount of work.

What do you find painful exactly with tagging?

Tags will be optional of course. When you query a URL / bookmark, there will be a binding to "edit the tags". This brings up another fuzzy-searchsable minibuffer with the list of tags to add / remove. Multiple tags can be added / removed at once.

  • However, composability of search may be at odds with trying to have a document that comprises all the bookmarks. A URL that has 3 tags needs to associate with all three tags, which leaves questions open as to how it sorts/renders

I'm not sure I understand what you mean. There are multiple ways to perform queries by tag:

To implement the above easily, we could add a Lisp syntax to the query language. (Is there a library for that already?) For instance

Besides, tag queries could be intermingled with regular fuzzy-matching on URL / Title:

  • It is desirable to be able to get data in and out in a number of useful formats that, it seems to me, includes HTML, Org Mode, perhaps some XML or JSON schema used by other browsers. Highly desirable to support both export and import (with some amount of idempotency to the import).

Absolutely. We will support the s-exp format by default, but it would be nice to add a Chrome / Firefox importer at least, and more in the future. Contributions are welcome!

  • It would almost immediately make nEXT a "killer app" if it was better at mass editing bookmarks than existing web browsers.

O it will!

tviti commented 5 years ago

I admittedly haven't read this entire thread, but to be clear, it sounds like in the new db format(s), it will be easy for users to add and access their own custom fields in the db?

I've been experimenting with using bookmark db files as "project files" in next, and it sounds like this ticket is already addressing a lot of the thoughts I had towards this. So far, I have in my init.lisp cmds for quickly adding/switching active db files, and then pulling/adding/commiting/pushing db files to a git repo for quick swapping and syncing without leaving the next UI. My next thought was to add annotations so you can take quick notes in the minibuffer as you add bookmarks, but with the proposed feature set I'm envisioning a cmd that opens up an entire emacs org buffer, and then saves the buffer contents alongside an export to html directly in the bookmark db, so that when you are reviewing your notes, you can have a cmd that pops up a new next window with the exported html!

Ambrevar commented 5 years ago

Thanks for the feedback!

The bookmarking commands have hooks, for instance it's possibly to add a "git sync" handler to the bookmark-url / bookmark-delete hooks so that it automatically sync the DB. Would that suit your use case?

Maybe we should expose a hook that's run on any database access, to avoid the need to add handlers to 3-4 hooks.

Annotations: Do you mean a :annotation key where the user is free to add any text to a bookmark?

Org/HTML: So would the Org buffer be saved in the database, or as the database? What about a minibuffer command which displays the annotations (or other details) of the selected bookmarks?

tviti commented 5 years ago

The bookmarking commands have hooks, for instance it's possibly to add a "git sync" handler to the bookmark-url / bookmark-delete hooks so that it automatically sync the DB. Would that suit your use case?

That's a good idea! Although in retrospect, I think using a git repo for syncing sqlite files was probably the wrong way to go about doing things (might have made more sense if the bookmark file format was some sort of plaintext). I'm thinking of just using syncthing to keep them synchronized between machines now, and if I go that route then there'd be no need to use hooks.

Annotations: Do you mean a :annotation key where the user is free to add any text to a bookmark?

Yes, I think something like that is basically what I'm thinking of. I've honestly never bothered with bookmarks in other browsers before, but the ease with which you can do fuzzy searches in next has gotten me using them a lot more now. What I'm experimenting with is having seperate bookmark.db files for a project I am working on, and then filling it up with resources that seem useful as I come accross them.

Of course, now I find myself with a growing list of websites, and not necessarily remembering why I wanted to save them in the first place, so what would be cool is to call a handler on bookmark add, that brings up the minibuffer so I can jot down my thoughts as I add bookmarks to the db.

Then, when you do a set-url-from-bookmark, the left hand side of the minibuffer would show all the URL, and the right hand side would have the text that you saved in ":annotation", so that it's easy to see what's in there and why.

The idea to go balls-to-the-wall and bring up an org-mode buffer is probably overkill (the idea was to have the org buffer saved in the db, not as the db), since I don't even necessarily see that being super useful for myself. BUT, if the db interface is easily extensible, then if somebody was so inclined, they could save some markdown, org-mode, or hell maybe even javascript (for more fancy annotations that a next-package would render in the page, like sticky-notes or text-highlights, although the potentially volatile nature of a web page would imo make the utility of something that complicated questionable; may as well just save the html locally and THEN annotate it).

Ambrevar commented 5 years ago

The SQLite default will be dropped, I'd like to have s-exp-based files from now on so that users can easily edit them and version them.

By default, the minibuffer bookmark candidates will be "URL TITLE" as for the rest. If you'd rather have "URL TITLE ANNOTATION" or just "URL ANNOTATION", you'll easily be able to customize the bookmark object-string.

Actually... Would there be a difference between TITLE and ANNOTATION in your use case?

Finally, the exporter will be customizable / fully programmable, which means that the user can also call a combination of exporters and export the bookmarks to multiple formats in one go!

4t0m commented 5 years ago

I'm excited about the idea of having a lightly structured org file (or several such files) as the ground truth for my bookmarks, making it easy to add/remove/annotate urls from within emacs. In my case I would want some separation between TITLE and ANNOTATION, where TITLES are displayed ~in full with some padding, and annotations displayed separately and probably truncated. It should be easy to scan the list of titles.

We could even limit fuzzy searches to separate bookmark files/databases, making it easy to search for urls/titles/annotations I have in org/notes/* or org/notes/economics.org, etc. Then there's basically no separation between my browser and my notes which frequently include urls. (Why would I want to search for the url within emacs and then send it to my browser? That's crazy!)

Does it seem like it would be too inefficient to have bookmarks stored-in / read-from / searched-on "human readable" org files with substantial text? Perhaps all headers with :bookmark: and their children with :annotation: are searchable, and the rest of the text is ignored?

(I imagine something like this would be a ways in the future, but it's something I'd want to take a stab at building for myself if it seems workable.)

(Related to emacs integration. #125 )

Ambrevar commented 5 years ago

Org integration is planned, but I don't think it will be the default. The main drawback of Org is that the bookmark files would not be line-based, which somewhat hinders the use of "Unix tools."

It's possible to parse Org files with cl-org in Common Lisp. I haven't tested it yet, but I suspect that going through a bunch of Org files and scanning their section + properties would not be long. This can be done lazily, i.e. on parse a file if it's date is younger than the last time it was parsed by Next.

Next can provide some importer / exporter examples in case it's not the default.

Exciting contributions ahead, anyone?! :)

Regarding annotations: yes we can fuzzy-match them, but within a minibuffer it amounts to very long lines (URL + title + annotation!), which will display over multiple lines (once we support it...). We really need a vertical minibuffer at this point!

tviti commented 5 years ago

If you'd rather have "URL TITLE ANNOTATION" or just "URL ANNOTATION", you'll easily be able to customize the bookmark object-string. Actually... Would there be a difference between TITLE and ANNOTATION in your use case?

Awesome! In my use case, I think for many entries, the TITLE will be sufficient for identification, but there's at least one entry in my bookmark.db where something like the following comes to mind:

TITLE: DWS 2017 report for Keauhou aquifer ANNOTATION: Contains population growth statistics (including yearly growth rate), and aquifer boundaries for Keauhou and Kiholo.

jbmestelan commented 5 years ago

Regarding the model of the prospective bookmark: would you consider adding a 'completer' function, specifying a Web request to provide automatic completion?

This was a feature of the Conkeror browser (here a link to its Webjump class, which effectively merged the notion of bookmark and search engine).

jmercouris commented 5 years ago

All completion in the mini buffer is provided via a completion function, this can be tuned to do literally whatever you want to produce the list of candidates! :-)

Ambrevar commented 5 years ago

I think what Jean-Baptiste suggest needs a bit more work. We would have to save this "webjump" in the bookmarks. Then we could type this in the minibuffer:

  wiki duck

and the minibuffer would complete with

Duck
Duck family (Disney)
DuckTales
...

The change is that the bookmark / history completion function needs to load the appropriate webjump and change the candidates depending on that.

From the Lisp core, it's a small change. Then we can offer a bunch of webjump by default.

jbmestelan commented 5 years ago

Yes, it is as Pierre has it: the proposed completion would depend on the search engine which provides it. For Wikipedia, the URL of the completion service is https://en.wikipedia.org/w/api.php?action=opensearch&search=.

(Other than that, I think Conkeror's webjump = Next's merged bookmark.) This would be super-handy, offering the same completion from the mini-buffer as you would get from the site's search box.

Ambrevar commented 5 years ago

Agreed, I'll work on it!

jbmestelan commented 5 years ago

This is great, thanks.

May I point out one last nicety of Conkeror's bookmark/webjump implementation to which I linked above? It concerns the behaviour of the bookmark when no search term is provided:

This litterally merges bookmarks and search engines, allowing the user to define a single bookmark for the two usages (with and without a search term).

Ambrevar commented 5 years ago

Great news: the new s-exp based bookmarks are now merged on master! They should automatically import the old SQLite database on first query.

What's implemented:

What remains to be implemented:

Let me know if I forgot something! Happy bookmarking!

4t0m commented 5 years ago

I'm trying to figure out how org integration should work. Here are my current thoughts:

Questions:

Does any of that seem superfluous or bad? What's missing from the above features?

tviti commented 5 years ago

Disclaimer: I'm spitballing here before a cup of coffee.

I've never used org for bookmark keeping tbh, but since the Emacs org integration is already so extensive, I'd wonder if it'd be possible to just use Emacs as a sort of org-mode-language-server for everything, and completely cut out the s-expr bookmark format from the pipeline.

Then, instead of having to come up with a feature set for the org-bookmark interaction, you'd just expose the org api to the next client (although I guess you'd have to decide still which commands to create next bindings for). Then it becomes up to the user to decide how it'd work, in the same way that it's kinda up to a user to decide how =org-mode= should work for them. For more involved database management tasks (like moving bookmarks between files), a single open-db cmd could just bring the whole thing up in emacs (if your format is already .org, then no sense trying to re-invent a UI for non-trivial management tasks).

Ambrevar commented 5 years ago

There is the cl-org library for parsing Org files, so we could use that. If it happens to be too limited for our use, then we can do as @tviti suggested: call emacs --eval ... to return an s-exp tree of the Org file.

intuser commented 4 years ago

Sorry, but I'm a bit confused.

Webjumps are for me one of the greatest features of conkeror. (And it's hard for me to imagine to change to a browser which doesn't provide something very near to webjumps.)

Now, the discussion on this issue is in part about the possibility of implementing something like webjumps. And I read the "Good news" of @Ambrevar on October 1 as saying that there is now something like webjumps for next. I'm testing now next 1.3.4 but I didn't find any way how to set a bookmark for, let's say wikipedia (let's say w, as in conkeror), and than search for next on wikipedia typing "w next" in the minibuffer. (That's how webjumps work in conkeror.)

First: Is this possible now in next? Second: If it is possible, could somebody provide a simple recipe?

Ambrevar commented 4 years ago

Sorry I'm afraid the "Good news" only concerned other bookmarking features, but not the webjumps :/

This is why this issue is still open: Help and pull requests to implement webjumps are welcome :)

tviti commented 4 years ago

I've never used Conkeror before, but is this not just:

https://github.com/atlas-engineer/next/blob/master/documents/MANUAL.org#searching-via-search-engine

?

On 12/23/19, Pierre Neidhardt notifications@github.com wrote:

Sorry I'm afraid the "Good news" only concerned other bookmarking features, but not the webjumps :/

This is why this issue is still open: Help and pull requests to implement webjumps are welcome :)

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/atlas-engineer/next/issues/79#issuecomment-568553557

Ambrevar commented 4 years ago

The difference is that webjumps complete from the website suggestions.

Thovthe commented 4 years ago

The bookmarking commands have hooks, for instance it's possibly to add a "git sync" handler to the bookmark-url / bookmark-delete hooks so that it automatically sync the DB. Would that suit your use case?

Does this mean I could integrate bookmarks with a site archiving system?

jmercouris commented 4 years ago

I believe yes, you could use it with an archival system