ahyatt / ekg

The emacs knowledge graph, app for notes and structured data.
GNU General Public License v3.0
222 stars 17 forks source link

Ideas #9

Open gnusupport opened 1 year ago

gnusupport commented 1 year ago

In my opinion, package `ekg' is not opinionated, rather, more human friendly, and personally I can't see it as substitute for other applications, I see it as unique program.

So I propose changing your introduction: "The ekg module is a simple but opinionated note taking application. It is a substitute for such other emacs applications such as org-roam or denote. ekg stands for emacs knowledge graph."

There are a few core ideas driving the design of ekg. The first is that a title and a tag are the same thing.

If I may say, I deal with many tags, tag I consider a class of attributes or properties to an object.

I have elementary objects, they have their types, subtypes, types can be actionable or not, there may be additional properties, there are relations, etc. All those are properties in some sense.

Tags are properties across everything and are in itself important. I use notion of tag types as well:

 1          Default
 2          Skill
 3          Topic
 4          Language
 5          Action
 6          Place
 7          Sales
 8          Computer

And notion that user should be able to enter tag types.

What we like in databases is use of intersection. It implies for me that the more various sets of properties are there, the better we may pin point to various destinations.

Practically it means we can better find the note we are looking for.

We can easier find relationships between objects.

Tags as separate properties from object name are useful to create more intersections. Eliminating tags eliminate usefulness.

Back to searching, I have mentioned intersections. That is basic principle by which various database based searches work. There are different tables, different columns, types, classes defined in the databases.

When searching we can then design functions such as:

  1. searching by tags only
  2. searching by tags and including name words as tags
  3. etc.
  4. etc. various combinations

How I have understood your idea is that you wish to say that searching through tags and names of objects is helpful. Sure it is.

But it is not the only way of making intersections, there is number

  1. and number 4. and X different ways for intersections.

If we think of merging various properties into "one" to provide better search results, than I can recommend the full text search, one reference is here:

Hyperscope full text search with PostgreSQL: https://hyperscope.link/3/6/7/6/8/Hyperscope-full-text-search-with-PostgreSQL-36768.html

As then, I can update tokens to include names of the tags, to include the name of object, description, text, language, country, related currency, names of related people, etc.

I can't say that excluding tags alone is beneficial, but I can say that including tags is beneficial in various search functions.

I have given you example in PostgreSQL, and I think full text search in SQLite does not exist. Even building on it is less productive for future, as it is single user database.

Build products that are multi-user and collaboration based.

This isn’t unique to ekg, other tools such as Logseq also consider tags to be equivalent to pages of the same name, although this functionality is limited since tags can only be just one word.

In my work I have table tags

                                              Table "public.tags"
┌───────────────────┬──────────────────────────┬───────────┬──────────┬───────────────────────────────────────┐
│      Column       │           Type           │ Collation │ Nullable │                Default                │
├───────────────────┼──────────────────────────┼───────────┼──────────┼───────────────────────────────────────┤
│ tags_id           │ integer                  │           │ not null │ nextval('tags_tags_id_seq'::regclass) │
│ tags_datecreated  │ timestamp with time zone │           │ not null │ CURRENT_TIMESTAMP                     │
│ tags_datemodified │ timestamp with time zone │           │          │                                       │
│ tags_usercreated  │ text                     │           │ not null │ CURRENT_USER                          │
│ tags_usermodified │ text                     │           │ not null │ CURRENT_USER                          │
│ tags_name         │ text                     │           │ not null │                                       │
│ tags_description  │ text                     │           │          │                                       │
│ tags_languages    │ integer                  │           │ not null │ 1                                     │
│ tags_tag1         │ integer                  │           │          │                                       │
│ tags_tag2         │ integer                  │           │          │                                       │
│ tags_tag3         │ integer                  │           │          │                                       │
│ tags_tagtypes     │ integer                  │           │ not null │ 1                                     │
│ tags_hidden       │ boolean                  │           │ not null │ false                                 │
│ tags_people       │ integer                  │           │          │                                       │
│ tags_rank         │ integer                  │           │ not null │ 0                                     │
└───────────────────┴──────────────────────────┴───────────┴──────────┴───────────────────────────────────────┘

You may notice that tag has its own properties. It has its name, and because tag is addressed by its unique ID, the name can have spaces. It may be long. It can contain any chars. It can have its description. It may be in different language.

It may be tagged by three other tags. Why is that useful? Tag once any object, and computer will update it with any additonal tags. If I tag something with "Video" computer will tag it with "Media" and "Video", as maybe that is what user wants. But I do not keep to add relevant synonyms or relevant tags, I can add just one of them. If that feature becomes very useful, I would simply provide tagging of tags.

Because tags are addressed by ID as integer and not by name, I can rename tags on the object without losing the tag from the object. All objects then appear with the renamed tag!

Then there are tag types. Very useful.

Maybe there is elementary object, like note, speaking about "Technical School", but that they provide skill of "Optician" is not derived from the name alone. That is one example among many why tags are useful.

It is however useful concatenating tags with name and searching among them, that is one of intersections that are helpful to human.

In org-roam, a tag is just a tag, so you can have a note called “emacs” and a tag called “emacs”, but these are not related.

Okay

ekg takes the idea a step further: there are (mostly) no titles, only tags. So, instead of writing text in a note called “emacs”, just write a note and tag it with “emacs”. There is no “title”, only tags.

In my opinion that design is backwards, it does not help.

If it is searching by name, or splitting name into words to search by those words, then it is not "tag".

Tag is a property separate from object. It may have different name than anything in the object. I can have "USD" for US dollar as tag.

Tags are properties of the object that have no special group or class.

It is useful to have tags in existence which are not related to any object. This is for human to have easier work. What if you have tags for currencies such as "USD", "EUR", "GBP", and you wish to tag object with it, but in that moment you did not have "GBP" ready, so instead of writing it every single time, you can prepare the tag list for user to select it easily.

Elementary Objects: https://www.dougengelbart.org/content/view/110/460/#2a1a

My elementary objects have "Currency" property directly attached, and whatever is more important it becomes directly attached to the object. Tags are there as dettached properties that may belong to any object.

Separate functions can be made to automatically relate objects to tags. Such as making tags out of the name. As that seems to be that what you designed.

Example is with the object named "NonGNU Emacs Lisp Package Archive" that may be automatically tagged by tags such as "Emacs" and "Archive".

If you write another note about emacs, also tag it “emacs”, and maybe something else too. Or tag it something more involved, like an idea: “emacs’s power derives from putting all data in buffers, and making all commands deal with buffers.” That’s a perfectly fine tag, and if you notice a connecting idea, you can tag it with this as well.

That is right and good. My tags may also be of arbitrary length.

Though tags are useful because they are more simplistic ideas, not complex.

Their usefulness is derived from their combination, not from their quality.

Their meaning shall be elementary meaning, not complex meaning.

Their purpose shall be generation of intersections.

A tag like "Emacs" is less useful, as it would find anything about Emacs. Then "Emacs" combined with "ekg" is more useful. Isn't it?

Using name of object to generate tags is useful. But excluding tags as such is not.

The advantage of this method is that it solves something that has bothered me for a while about the recent suite of tools like org-roam: backlinks are non-symmetrical. If you enter a note in your org-roam daily about emacs, and link it to the emacs note, then when you go to the emacs note, you have to explicitly enable the backlinks buffer to see the daily entry where you first entered it. Systems such as Logseq and the original Roam have backlinks alongside normal content, but this doesn’t seem possible in emacs, where a buffer of a file is expected show the file, and tricks with overlays can’t solve the issue. Even if it could, I want a system in which it doesn’t matter where you enter the data, it shows up in the original place the same as everywhere else it is linked to, not as a backlink, but just as part of the content. Having notes with no title, only tags, makes this possible, because there is no longer a difference between linking and writing in the context in, both are denoted by tags.

In my opinion your solution is only one of many solutions. It is not necessary to be so, it can be implemented in various ways.

As a consequence of this design, notes can be small, because to add another note to a subject, you don’t need to append to an existing note, you can create another note.

That is right, multiple notes shall be available for any possible relation.

Additionally, ekg has another key difference: it uses sqlite instead of the filesystem. When notes are small and do not have titles, files don’t make a lot of sense anymore.

File system is one way of "tagging" and sorting of files. It has directories, filenames, hierarchical structure and access by variety of means.

It is up to user to "sort" his stuff, relate to each other. It is not as flexible as database.

Additionally, the filesystem is limited. Even in org-roam, which uses it, it needs to be augmented with sqlite anyway to enable fast querying of tags and other operations. The sqlite-only approach also means it is much easier to make certain kinds of changes, since they only involve changing the database and not the text as well. In general, text and data are separated as much as possible here, so there’s no need or desire for the text to have to store data as well, we leave that completely to the database.

In my opinion your introduction is difficult to comprehend. It is good if you make video or screenshots.

Prefixed tags

Another concept, loosely applied in ekg is that of tags with standard prefixes. By default, date tags are prefixed with “date/”. This is a way to distinguish date tags from other kinds of tags. Most tags shouldn’t need it, but it often is useful to have prefixes to group tags in some way. For instance, perhaps all idea tags should be prefixed with “idea/”. In my ekg repository I use in my company, I have “person/” as a tag prefix for my coworker’s username.

We speak here of tag types. Think of implementing it. If you implement "Default" tag type and let user add any other tag type, that way you will give useful system. Users can then decide that tag belongs to tag type "Idea", or "Skill" or maybe "Time".

The benefit of this is that it’s now possible to narrow in on just tags of a certain type if necessary.

We speak here basically of tags for tags. Or properties of the tags. And I find it very good notion for future.

There are a few other types of prefixes commonly used for tags. One is that titled resources have default tags that are prefixed with “doc/”, followed by the name of the document. Removed tags are prefixed with “trash/”, but these are normally invisible to the user. There’s a section on these trash tags below which goes into more detail.

I would not do it that way to allow human mistake, for human to add string with slash, rather using tag types.

I have table tagging, if tag is removed from object, it is removed from table tagging, the table references tag with elementary object or with people. But there is use for the tag to remain in the database, as for future selection of tags.

In this case tag can be related to people object or document object. There is hundred of other tables, but for those two tags are most useful.

                                                    Table "public.tagging"
┌──────────────────────┬──────────────────────────┬───────────┬──────────┬───────────────────────────────────────────────────┐
│        Column        │           Type           │ Collation │ Nullable │                      Default                      │
├──────────────────────┼──────────────────────────┼───────────┼──────────┼───────────────────────────────────────────────────┤
│ tagging_id           │ integer                  │           │ not null │ nextval('peopletags_peopletags_id_seq'::regclass) │
│ tagging_datecreated  │ timestamp with time zone │           │ not null │ CURRENT_TIMESTAMP                                 │
│ tagging_datemodified │ timestamp with time zone │           │          │                                                   │
│ tagging_usercreated  │ text                     │           │ not null │ CURRENT_USER                                      │
│ tagging_usermodified │ text                     │           │ not null │ CURRENT_USER                                      │
│ tagging_tags         │ integer                  │           │ not null │                                                   │
│ tagging_people       │ integer                  │           │          │                                                   │
│ tagging_hyobjects    │ integer                  │           │          │                                                   │
│ tagging_description  │ text                     │           │          │                                                   │
└──────────────────────┴──────────────────────────┴───────────┴──────────┴───────────────────────────────────────────────────┘

My tagging is such that I can tag with existing tags or newly created tags. Tag searching uses function completing-read-multiple to allow me finding multiple tags.

ahyatt commented 1 year ago

Thank you very much for all these excellent notes.

Let me respond to some of this:

First, I agree that we really should have a full-text search. That would be the responsibility of the triples package, which I also maintain. I need to try this out, I hope that the emacs 29 sqlite has the necessary capabilities to do this. Anyway, I completely agree that it would contain tags and text.

About tags with tags, yes, right now ekg is very close to being able to do that. It just needs the ability to have tags have notes, which is very natural IMHO and fits in with how at least I want to organize things. Since the UI is note-driven, once we have notes, the user can tag the tag with whatever tags they want. Whether this has any interesting effects is something I'll have to think about, your experience on this is interesting though.

I also do have a video I've recorded, but I don't want to do any sort of advertisement of this package until it is in a repository. I'll see if I can make the text a bit more understandable, though.

gnusupport commented 1 year ago

In PostgreSQL it is easy to provide full text search. I strongly suggest switching to PostgreSQL and going away from single user model, single computer model. Collaboration with multiple users is so much more useful.

I have handled searches by using function like following:

(defun rcd-sql-search-snippet-for-and-column (column query &optional operator logic)
  (let* ((words (split-string query nil t (rx (any whitespace))))
     (operator (or operator "~*"))
     (logic (or logic "AND"))
     (my-and (mapconcat 
          (lambda (e)
            (concat " " column " " operator " " 
                (sql-escape-string e) " "))
          words logic)))
     my-and))

What it does is creating SQL queary with AND or OR logic for all words:

(rcd-sql-search-snippet-for-and-column "my_db_column" "query words I am searching for") ➜ " my_db_column ~* E'query' AND my_db_column ~* E'words' AND my_db_column ~* E'I' AND my_db_column ~* E'am' AND my_db_column ~* E'searching' AND my_db_column ~* E'for' "

and

(rcd-sql-search-snippet-for-and-column "my_db_column" "query words I am searching for" "LIKE" "OR") ➜ " my_db_column LIKE E'query' OR my_db_column LIKE E'words' OR my_db_column LIKE E'I' OR my_db_column LIKE E'am' OR my_db_column LIKE E'searching' OR my_db_column LIKE E'for' "

As that helps in constructing SQL queries to search this and that. On my side this is mostly used function.

gnusupport commented 1 year ago

For PostgreSQL:

Mastering PostgreSQL Tools: Full-Text Search and Phrase Search - Compose Articles: https://compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/

PostgreSQL: Documentation: 15: Chapter 12. Full Text Search: https://www.postgresql.org/docs/15/textsearch.html

FOR SQLite:

SQLite FTS5 Extension: https://www.sqlite.org/fts5.html

gnusupport commented 1 year ago

I suggest looking into following design:

TECHNOLOGY TEMPLATE PROJECT OHS Framework : https://www.dougengelbart.org/content/view/110/460/

Objects are basic content packets of an arbitrary, user and developer extensible nature. Types of elementary objects could contain:

I strongly suggest being able to record any kind of object. It does not mean saving video into database, but if video is on the file system, it would mean indexing the video from file system or putting it in the file system structure devised by program, and note has type of video.

I use that approach. See more on that link about Open-Document Hypertext Systems, as following that design makes programs most useful.