Open arvinpan opened 6 years ago
I don't think removal of org-id-locations
would be a benefit, since that still would be needed for headline entries. Other than that, I think the idea has merit.
The downside is that handling of files will be a bit slower. Each entry in org-brain
have an ID, and for files the ID is the relative file name, so if we have an entry called index
we'll just open index.org
inside the org-brain-path
. If instead the file had the id OB:289
(or similar) we'd have to scan all org-brain
files and search for their ID until we find the correct one. This could probably be cached in a dictionary though.
Thank you for your quick response, and now I understand much better. Basically, the current design cares more about the performance.
I've removed the first "benefit" :)
Update: I've started working on this and it works okay so far. Since it changes quite a lot how things work, I'll try to test it some more before commiting anything.
Awesome!!! If you need a tester, just leave a comment, anytime :D
Thumbs up from me too 👍
Thanks for looking into this @Kungsgeten! I've had some performance issues with growing my notes directory and updating ID locations, so having a cache would be wonderful.
Do you mind sharing how you are implementing the cache? Will org-id
still be the sub-system for assigning and keeping track of ID properties or are you moving away from using org-id
? Will the cache be a file that has a dictionary to keep track of the IDs and relationships that org-brain references that is updated whenever org-brain-update-id-locations
is called or a new relationship is made?
I'm mainly just curious how you would solve this problem to help me improve my own programming knowledge. I also built some helper functions for my particular workflow on top of org-brain, although I don't think how you plan to do the cache will impact it.
@namdnguyen Hi! I haven't worked on the feature since June unfortunately, but hopefully I'll get around to it again soon. I have a variable called org-brain-file-ids
which stores an alist of id and the file they represent. The file's unique ID is stored at the top of each file in a #+BRAIN_ID:
way. This variable is saved to a file with org-brain-save-data
in the similar way to how pinned entries work.
(defvar org-brain-file-ids nil)
(defvar org-brain-file-id-regex "^OB:[-a-z0-9]+$")
(defun org-brain-file-entry-id (entry &optional create)
"Return id of ENTRY, if it exists.
If CREATE is t, an id will be created for the file (if it doesn't
already have one)."
(if (string-match-p org-brain-file-id-regex entry)
entry
(or (cdr (assoc "BRAIN_ID" (org-brain-keywords entry)))
(when create
(let ((id (org-id-new "OB")))
(with-current-buffer (find-file-noselect (org-brain-entry-path entry))
(goto-char (point-min))
(insert (concat "#+BRAIN_ID: " id "\n\n"))
(save-buffer)
(push (cons id entry) org-brain-file-ids)
(org-brain-save-data))
id)))))
The idea is that an ID is created automatically for all new files, if done via the org-brain-visualize
interface. I'll probably also add that the IDs are added automatically in case they're not there, so that IDs for a file is a requirement (easier when implementing new features, less variants to support).
Then I have functions which tries to get a file, when you have its ID.
(defun org-brain-file-from-id (id)
"Get the relative file path of a file entry from ID.
If a file with that ID doesn't exist, return nil.
This function modifies `org-brain-file-ids'."
(let ((file (cdr (assoc id org-brain-file-ids))))
(if (and file (file-exists-p (org-brain-entry-path file)))
file
(if-let ((found-file (seq-find (lambda (x) (equal (org-brain-file-entry-id x) id))
(org-brain-files t))))
(progn
(if (assoc id org-brain-file-ids)
(setf (cdr (assoc id org-brain-file-ids)) found-file)
(push (cons id found-file) org-brain-file-ids))
(org-brain-save-data)
found-file)
;; The ID doesn't exist, so remove it from org-brain-file-ids
(setq org-brain-file-ids (assq-delete-all id org-brain-file-ids))
(org-brain-save-data)))))
The idea here is that if you've renamed the file, or moved it, or removed it, it will scan all your files and see if it finds the ID. In that case it updates org-brain-file-ids
.
If I remember correctly the hardest part was transforming an old configuration to the new one, since I'd have to write to a lot of files. I had to modify a lot of existing functions, but the new "master function" is this:
(defun org-brain-create-file-ids ()
"Create file ids for all files in `org-brain-path' and update all relationships."
(interactive)
(dolist (file-entry (org-brain-files t))
(let ((children (org-brain--linked-property-entries file-entry "BRAIN_CHILDREN"))
(parents (org-brain--linked-property-entries file-entry "BRAIN_PARENTS"))
(friends (org-brain-friends file-entry)))
(org-brain--remove-relationships file-entry)
(org-save-all-org-buffers)
(org-brain-update-id-locations)
(dolist (child children)
(org-brain-add-relationship file-entry child))
(dolist (parent parents)
(org-brain-add-relationship parent file-entry))
(dolist (friend friends)
(org-brain--internal-add-friendship file-entry friend))))
(message "Ids have been created for all files"))
I do not think that this new system will make org-brain faster (but I may be wrong). Having a cache for the headlines too might be faster though, not sure.
Hey @Kungsgeten! Thanks for sharing the details. If you have this feature being worked out on a local branch and are willing to share it on GitHub, I would love to test it out and tinker with it. Thanks again for this package. It's been so useful for my research projects!
For me, renaming, moving, or removing files is rare, so it would likely help speed things up for me. If it just adds the generated ID to a cache for a new note (which org-brain-file-entry-id
from above looks like it does this) and then uses the cache for adding relationships to the new file, it would be much faster.
Currently, if you add a relationship to any note, is it running org-brain-update-id-locations
each time? It seems like it is in my case, so I wasn't quite sure when it's triggered.
To add some more context for why it's slow for me, I work with emacs and org-brain on a Windows 10 laptop and a Linux one. I have about 215 notes currently, and it is WAY slower to update IDs on the Windows machine than the Linux one, despite the Windows one being newer and with an SSD drive. From what I found, this is partly due to the fact that Windows doesn't efficiently work with many small files as well as Linux, but I'm not really sure why it's slower.
Hi (again)!
Just wanted to say that I'm trying to introduce a document level property drawer into core org-mode. In my mind that would help org-brain out with file-based entries. It would mean the same syntax and commands used today for outline-nodes would work also for files. Org brain would in the end not have to create custom keywords for ID's to support the resolution of this issue, and it wouldn't need custom keywords for links either!
Link to post on the org-mode mailing-list: Proposal for new document-level syntax
@Whil- Nice idea! I haven't looked at your repository containing the changes, but there's a lot of code in org-brain
to cover both file entries and headline entries. It would be nice if some of that could be removed :)
I'm wondering if create an ID for an org-brain file would be a good choice to manage entires' relationships.
For example, something like:
+TITLE: File Subject
+ID: XXXX-XX-...
And here are few benefits:
org-brain-update-id-locations
command anymore~(my bad, I thought the locations file was a collection of autogenerated IDs for all file entries, which in fact contains IDs inside those org files)As the README says, the org-brain project provided a wiki and a mind map. Currently, if I renamed my "wiki" files, I need to modify my "mind map" entry IDs accordingly to make sure all relationships work. Or use the
org-brain-rename-file
mechanisim which is not handy enough when restructuring multiple "wiki" folders and files. If we introduced an ID for the file entry, maybe "wiki" and "brain map" can be decoupled and produce an easier-to-use org-brain?Also, I know #+ID may not be a good choice as it's not defined in org-mode, and there must have reasons why we prefer using filenames. Glad to see your opinions :)