BoostIO / BoostNote-Legacy

This repository is outdated and new Boost Note app is available! We've launched a new Boost Note app which supports real-time collaborative writing. https://github.com/BoostIO/BoostNote-App
Other
17.08k stars 1.47k forks source link

Using SQL based db as primary data storage (along regular files) #2259

Open takacsot opened 5 years ago

takacsot commented 5 years ago

Concepts

This issue is more like a discussion of an architectural change.

Summary: what is using sql based storages a primary/operational storage for the application.

Detail: Currently it is using the filesystem with CSON file format (CSON is not so widely supported). All operation on data is relying on file manipulation and every change what changes data structure and algorithm on data structure requires "handmade" code. As an example: if we would need a new relationship to implement between data it has to change storage format (what is obvious) and need to implement the corresponding algorithm (+ UI change, obviously, otherwise none of the improvement makes sense :) )

At first, I would consider using SQLite as embeddable, self-containing SQL storage engine.

The concept would be that from live, operational point of view using SQL based storage have several advantages (not necessarily in importance/relevancy):

(Remark: On the other hand, we should not ignore filesystem as secondary storage if the file system would be more human readable (neither file names nor CSON structure helps too much in it). I think a simple SQL file is much more easy to process with existing tooling.)

This concept could be found in ZimWiki. When starting up it is building an SQLite database and then it is used internally for efficiency.

Common use case examples

list of directories

select * from folder order by name

list nodes in directories

select note_id, key, title from note where folder = :folder order by upper(title) asc
-- with pagination
select note_id, key, title from note where folder = 'temp' 
order by note_id asc
limit 200, 100

list node with (any?) sort criteria

select note_id,... from note where... order by {title|is_starred|is_pinned|created_at, updated_at}

list all node with directoreis

select *
from note
join folder on (note.folder = folder.key)
order by folder.name, note.title -- etc any other sort criteria

counting nodes in folders

select folder.name, count(1)
from note
join folder on (note.folder = folder.key)
group by folder.name
order by folder.name

search by tags

select title, tag 
from note
join note_tag using (note_id) 
where tag = 'toolbox'

as you could see it has endless flexibility in search and the heavy lifting is done by the sql engine.

snippets having java fragment

select note.*
from snippet
join note using (note_id)
where snippet.mode = 'SQL'

Let me demonstrate some potential improvements

storeing pre-compiled html

Although I did not notice any performance issue when rendering html from markdown I already noticed some delay on rendering diagrams (e.g. plantuml).

If the rendered html is "cached" in the system (including diagram images, etc) there would be no delay just show the html part.

Actually, it worth testing is there any issue of rendering large documents (as for a moment there is not such a thing as note hierarchy so there is no "natural" way of splitting larger writings - e.g into sections).

note cross reference

Once the note is prepared it could include the graph of cross-reference pages for easy navigation (e.g. "Related pages") based on links, tags, parent elements (wither tag, node or folder parent)

folder/note/tag hierarchy

This is the simplest implementation (for tags) to add but probably not so efficient when querying. There are other models (like closure table) with better query characteristics (and a little more complicated insert/update logic)

alter table tag add parent varchar;
insert into tag (tag, parent) values 
('tag1',null),
('tag11','tag1'),
('tag12','tag1'),
('tag121','tag12'),
('tag122','tag12'),
('tag13','tag1'),
('tag2',null),
('tag21','tag2');

select * from tag;

with RECURSIVE
under_tag (root, tag, level) as (
    select tag, tag, 0 
    from tag
    where parent is null
 union all
    select under_tag.root, tag.tag, under_tag.level+1
    from tag
    join under_tag on tag.parent = under_tag.tag
    order by 3 desc
)
select root, tag, level
from under_tag;
root tag level
tag1 tag1 0
tag1 tag11 1
tag1 tag12 1
tag1 tag121 2
tag1 tag122 2
tag1 tag13 1
tag2 tag2 0
tag2 tag21 1

searching for a child of specific tag

with RECURSIVE
under_tag (root, tag, level) as (
    select tag, tag, 0 
    from tag
    where tag = 'tag12' 
 union all
    select under_tag.root, tag.tag, under_tag.level+1
    from tag
    join under_tag on tag.parent = under_tag.tag
    order by 3 desc
)
select root, tag, level
from under_tag
root tag level
tag12 tag12 0
tag12 tag121 1
tag12 tag122 1

ex: parent path (as nexted directory structure breadcrums)

with RECURSIVE
under_tag (root, tag, ppath) as (
    select tag, tag, tag 
    from tag
    where parent is null
 union all
    select under_tag.root, tag.tag, under_tag.ppath||'/'||tag.tag
    from tag
    join under_tag on tag.parent = under_tag.tag
)
select root, tag, ppath from under_tag
root tag ppath
tag1 tag1 tag1
tag2 tag2 tag2
tag1 tag11 tag1/tag11
tag1 tag12 tag1/tag12
tag1 tag13 tag1/tag13
tag2 tag21 tag2/tag21
tag1 tag121 tag1/tag12/tag121
tag1 tag122 tag1/tag12/tag122

ex: count by parent aggregate tag. aka sum of child counts

with RECURSIVE
under_tag (ancestor, tag, parent) as (
    select tag, tag, parent 
    from tag
 union all
    select under_tag.ancestor, tag.tag, tag.parent
    from tag
    join under_tag on tag.parent = under_tag.tag
)
select ancestor, count(1)-1 from under_tag group by ancestor
ancestor count(1)-1
tag1 5
tag11 0
tag12 2
tag121 0
tag122 0
tag13 0

In general, the same logic could be applied to folders and notes too.

According to my investigation, the following issues could gain advantages from this implementation:

Used SQL data structure:

        create table folder(
            key varchar(256),
            name varchar(256),
            color varchar(256)
        )

        create table note(
            note_id integer primary key,
            key varchar,
            type varchar(256),
            folder varchar(256),
            title varchar,
            description varchar,
            content text,
            is_starred bool,
            is_trashed bool,
            is_pinned bool,
            created_at timestamp,
            updated_at timestamp
        )

        create table tag(
            tag varchar primary key
        )

        create table note_tag(
            note_id integer, 
            tag varchar
        )

        create table snippet(
            note_id integer,
            name varchar,
            mode varchar,
            content text
        )

        create table attachment (
            key varchar,
            name varchar,
            file blob
        )

For experimental purposes, I could provide a script what is converting boosnote files into this database structure.

ghost commented 5 years ago

for many features a database is absolutely required. for example, auto completion is strongly tied to syntax. whether human readable format or database each are good solutions with advantages and disadvantages. what matter s is acknowledging their limitations and taking advantage of either one.

ghost commented 5 years ago

if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.

ehhc commented 5 years ago

To be honest, i don't think it will be implemented in boostnote. But at the moment, there is an ongoing discussion to use a database for notes in the new version of boostnote that is developed from scratch at the moment. Please feel free to a) help to implement it and b) join the discussion

a) : https://github.com/BoostIO/Boost b) slack group please join the channel "renewal"

takacsot commented 5 years ago

if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.

I think we could keep both. If any external process changes a note the app just picks it up and integrate into the database. By this file could be editable but we still gain from the use of database. (the usability of current filesystem storage by any external tooling is another topic to discuss - there open issues about it already)

gotofritz commented 5 years ago

Hi. I started using boostnote because I like the fact I have "physical files" I can move about, and store in different devices. If I wanted a database I would just create one directly...

On the other hand, early Evernote used a db for metadata IIRC while keeping the content in files, so perhaps that is a the best of both worlds?

xvusrmqj commented 5 years ago

maybe not a good idea:

  1. database is not a plaintext to read. cson file can be read without boostnote.
  2. incremental synchronization is become impossible. now we can use dropbox to incremental synchronization. but if use sqlite , how to synchronize?
takacsot commented 5 years ago

Considering the current "plaintext" cson format as something to read without Boostnote is far from reality. Neither the naming convention, nor the format is suitable for a human to work with.

With real plain text with front-matter to store metadata AND and a practical file naming convention could make it useful for non-boostnote use.

Incremental synchronization should not be an issue. As I mentioned it could be imagined as "primary" or "operational" storage. Where boostnote would be working. It would be a perfectly viable solution to "synchronize" between file system periodically and frequently. By this you could get the best of both solution.

Plus imagine if you intend to make a "corporate" level document synchronization? just replace sqlite to some remote database solution and you got shared documents automatically. (keep in mind that locking a file while modifying is an issue today because the files could be modified in parallel - so having a reliable file level or db level locking is needed to implement anyway)

Flexo013 commented 4 years ago

Here are issues regarding the readability of the cson files: #1447 and #490

rrjp commented 4 years ago

CSON isn't ideal but I can 100% open them in Atom, VS Code, SED, AWK, Python or any other text based tool and manipulate them (global search and replace across all notes for example). The format is more suitable for humans to work with than some of the other file formats I work with at work every day. Remember this is a tool for programmers.

I was using One Note on Mac at work and it got screwed up such that there are notes on my local drive in some database format that Microsoft will not disclose and can't fix that never got sync'd to the server (that is somehow broken now) so I'm having to copy out years of notes manually into Boost Note because One Note is holding them hostage in some database. I don't want to see a repeat of that dumpster fire ever again. I guess I could just use md files and a directory structure with Atom / VS Code if Boost Note is going to start requiring putting notes into a DB. More complexity, more points of failure, less reliability.