arvinpan commented 6 years ago

I'm wondering if create an ID for an org-brain file would be a good choice to manage entires' relationships.

For example, something like:

+TITLE: File Subject

+ID: XXXX-XX-...

And here are few benefits:

~the .org-id-locations file can be removed, and no need to run the org-brain-update-id-locations command anymore~(my bad, I thought the locations file was a collection of autogenerated IDs for all file entries, which in fact contains IDs inside those org files)
no need to worry about the issues caused by renaming file outside the org-brain system, like #114
easier to implement feature #48
using filename as entry ID, our program needs to be careful about the special cases in file name when adding relationships, eg. project/long folder name with special chars/xxxx
...

As the README says, the org-brain project provided a wiki and a mind map. Currently, if I renamed my "wiki" files, I need to modify my "mind map" entry IDs accordingly to make sure all relationships work. Or use the org-brain-rename-file mechanisim which is not handy enough when restructuring multiple "wiki" folders and files. If we introduced an ID for the file entry, maybe "wiki" and "brain map" can be decoupled and produce an easier-to-use org-brain?

Also, I know #+ID may not be a good choice as it's not defined in org-mode, and there must have reasons why we prefer using filenames. Glad to see your opinions :)

Kungsgeten commented 6 years ago

I don't think removal of org-id-locations would be a benefit, since that still would be needed for headline entries. Other than that, I think the idea has merit.

The downside is that handling of files will be a bit slower. Each entry in org-brain have an ID, and for files the ID is the relative file name, so if we have an entry called index we'll just open index.org inside the org-brain-path. If instead the file had the id OB:289 (or similar) we'd have to scan all org-brain files and search for their ID until we find the correct one. This could probably be cached in a dictionary though.

arvinpan commented 6 years ago

Thank you for your quick response, and now I understand much better. Basically, the current design cares more about the performance.

I've removed the first "benefit" :)

Kungsgeten commented 6 years ago

Update: I've started working on this and it works okay so far. Since it changes quite a lot how things work, I'll try to test it some more before commiting anything.

arvinpan commented 6 years ago

Awesome!!! If you need a tester, just leave a comment, anytime :D

Whil- commented 6 years ago

Thumbs up from me too 👍

namdnguyen commented 6 years ago

Thanks for looking into this @Kungsgeten! I've had some performance issues with growing my notes directory and updating ID locations, so having a cache would be wonderful.

Do you mind sharing how you are implementing the cache? Will org-id still be the sub-system for assigning and keeping track of ID properties or are you moving away from using org-id? Will the cache be a file that has a dictionary to keep track of the IDs and relationships that org-brain references that is updated whenever org-brain-update-id-locations is called or a new relationship is made?

I'm mainly just curious how you would solve this problem to help me improve my own programming knowledge. I also built some helper functions for my particular workflow on top of org-brain, although I don't think how you plan to do the cache will impact it.

Kungsgeten commented 5 years ago

@namdnguyen Hi! I haven't worked on the feature since June unfortunately, but hopefully I'll get around to it again soon. I have a variable called org-brain-file-ids which stores an alist of id and the file they represent. The file's unique ID is stored at the top of each file in a #+BRAIN_ID: way. This variable is saved to a file with org-brain-save-data in the similar way to how pinned entries work.

(defvar org-brain-file-ids nil)

(defvar org-brain-file-id-regex "^OB:[-a-z0-9]+$")

(defun org-brain-file-entry-id (entry &optional create)
  "Return id of ENTRY, if it exists.
If CREATE is t, an id will be created for the file (if it doesn't
already have one)."
  (if (string-match-p org-brain-file-id-regex entry)
      entry
    (or (cdr (assoc "BRAIN_ID" (org-brain-keywords entry)))
        (when create
          (let ((id (org-id-new "OB")))
            (with-current-buffer (find-file-noselect (org-brain-entry-path entry))
              (goto-char (point-min))
              (insert (concat "#+BRAIN_ID: " id "\n\n"))
              (save-buffer)
              (push (cons id entry) org-brain-file-ids)
              (org-brain-save-data))
            id)))))

The idea is that an ID is created automatically for all new files, if done via the org-brain-visualize interface. I'll probably also add that the IDs are added automatically in case they're not there, so that IDs for a file is a requirement (easier when implementing new features, less variants to support).

Then I have functions which tries to get a file, when you have its ID.

(defun org-brain-file-from-id (id)
  "Get the relative file path of a file entry from ID.
If a file with that ID doesn't exist, return nil.
This function modifies `org-brain-file-ids'."
  (let ((file (cdr (assoc id org-brain-file-ids))))
    (if (and file (file-exists-p (org-brain-entry-path file)))
        file
      (if-let ((found-file (seq-find (lambda (x) (equal (org-brain-file-entry-id x) id))
                                     (org-brain-files t))))
          (progn
            (if (assoc id org-brain-file-ids)
                (setf (cdr (assoc id org-brain-file-ids)) found-file)
              (push (cons id found-file) org-brain-file-ids))
            (org-brain-save-data)
            found-file)
        ;; The ID doesn't exist, so remove it from org-brain-file-ids
        (setq org-brain-file-ids (assq-delete-all id org-brain-file-ids))
        (org-brain-save-data)))))

The idea here is that if you've renamed the file, or moved it, or removed it, it will scan all your files and see if it finds the ID. In that case it updates org-brain-file-ids.

If I remember correctly the hardest part was transforming an old configuration to the new one, since I'd have to write to a lot of files. I had to modify a lot of existing functions, but the new "master function" is this:

(defun org-brain-create-file-ids ()
  "Create file ids for all files in `org-brain-path' and update all relationships."
  (interactive)
  (dolist (file-entry (org-brain-files t))
    (let ((children (org-brain--linked-property-entries file-entry "BRAIN_CHILDREN"))
          (parents (org-brain--linked-property-entries file-entry "BRAIN_PARENTS"))
          (friends (org-brain-friends file-entry)))
      (org-brain--remove-relationships file-entry)
      (org-save-all-org-buffers)
      (org-brain-update-id-locations)
      (dolist (child children)
        (org-brain-add-relationship file-entry child))
      (dolist (parent parents)
        (org-brain-add-relationship parent file-entry))
      (dolist (friend friends)
        (org-brain--internal-add-friendship file-entry friend))))
  (message "Ids have been created for all files"))

I do not think that this new system will make org-brain faster (but I may be wrong). Having a cache for the headlines too might be faster though, not sure.

namdnguyen commented 5 years ago

Hey @Kungsgeten! Thanks for sharing the details. If you have this feature being worked out on a local branch and are willing to share it on GitHub, I would love to test it out and tinker with it. Thanks again for this package. It's been so useful for my research projects!

For me, renaming, moving, or removing files is rare, so it would likely help speed things up for me. If it just adds the generated ID to a cache for a new note (which org-brain-file-entry-id from above looks like it does this) and then uses the cache for adding relationships to the new file, it would be much faster.

Currently, if you add a relationship to any note, is it running org-brain-update-id-locations each time? It seems like it is in my case, so I wasn't quite sure when it's triggered.

To add some more context for why it's slow for me, I work with emacs and org-brain on a Windows 10 laptop and a Linux one. I have about 215 notes currently, and it is WAY slower to update IDs on the Windows machine than the Linux one, despite the Windows one being newer and with an SSD drive. From what I found, this is partly due to the fact that Windows doesn't efficiently work with many small files as well as Linux, but I'm not really sure why it's slower.

Whil- commented 5 years ago

Hi (again)!

Just wanted to say that I'm trying to introduce a document level property drawer into core org-mode. In my mind that would help org-brain out with file-based entries. It would mean the same syntax and commands used today for outline-nodes would work also for files. Org brain would in the end not have to create custom keywords for ID's to support the resolution of this issue, and it wouldn't need custom keywords for links either!

Link to post on the org-mode mailing-list: Proposal for new document-level syntax

Kungsgeten commented 5 years ago

@Whil- Nice idea! I haven't looked at your repository containing the changes, but there's a lot of code in org-brain to cover both file entries and headline entries. It would be nice if some of that could be removed :)

Kungsgeten / org-brain

What if we create an ID for an org-brain file and save it inside the file? #118

+TITLE: File Subject

+ID: XXXX-XX-...