TheCedarPrince / NoteMate.jl

Tools for working with your own knowledge base
MIT License
11 stars 1 forks source link

Writing a general Obsidian zettlekasten parser #21

Open AlphabetsAlphabets opened 1 year ago

AlphabetsAlphabets commented 1 year ago

All ZK notes are different for the most part they have "tags" or "category" at the very top. That is going to be very helpful as it is a constant. But everything else into OKM is going to be more difficult.

OKM specifies a references and a bibliography section. This is the part where it gets difficult. I use footnotes for my references. Some have a dedicated section. I don't have a "notes" section either. I go straight into the content after writing all the metadata/frontmatter. So, how should I go about handling this?

Those are the general issues as for the Obsidian specific issues it has to do with the various linking methods like [[301 - Haskell]] there are also more such as [[filename#heading]], [[filename^block containing this text]] and there are more.

If it helps below are two of my notes one in my writing style. The second complies with OKM.

Non-OKM

tags: #dev 
links: [[dev wiki]], [[301 - Haskell]]

---

# 301 - Functions
A function is defined by specifying its name, arguments an then an expression[^1]. The basic format looks like this:

```hs
<function name> <arg1> ... <nth arg> = <expression>

A simple function would look like this.

foo name = "My name is " <> name

Everything in haskell is a function. putStrLn is just a function that accepts one parameter. It is a good idea to add type signatures for the application so the above function will have this signature.

foo :: String -> String

Because foo takes a String and returns String.

bar greeting name = greeting <> " " <> name

This function's signature is this.

bar :: String -> String -> String

Which is pretty mind bending because all haskell functions only accepts one parameter. bar will take one string as a parameter, and returns a function, then that function returns a string^2. ^f18191

This function only accepts one parameter thing is explained in [[301 - Anonymous Functions#^702438]].

Partial application

A partial application is interesting. It works sort of like a default value for functions. First, define a function as per normal.

el tag content = "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

^8a5b8d

As for the "default value" part it looks like this.

html_ :: String -> String
html_ = el "html"

body_ :: String -> String
body_ = el "body"

When you call body_ you only need to pass in one argument. That's because the content argument is already passed in. You also don't need to specify that another remaining arugment is necessary.

Changing body_'s definition to this

body_ content = el "body" content

hls will pick it up and warn you that it can be simplified.

Diagnostics:
1. Eta reduce
   Found:
     body_ content = el "body" content
   Why not:
     body_ = el "body"

# OKM

301 - Functions - OKM

Date: June 6th 2023 Summary: Covers function usage and creation in Haskell tags: #dev


A function is defined by specifying its name, arguments an then an expression^1. The basic format looks like this:

<function name> <arg1> ... <nth arg> = <expression>

A simple function would look like this.

foo name = "My name is " <> name

Everything in haskell is a function. putStrLn is just a function that accepts one parameter. It is a good idea to add type signatures for the application so the above function will have this signature.

foo :: String -> String

Because foo takes a String and returns String.

bar greeting name = greeting <> " " <> name

This function's signature is this.

bar :: String -> String -> String

Which is pretty mind bending because all haskell functions only accepts one parameter. bar will take one string as a parameter, and returns a function, then that function returns a string^2. ^f18191

This function only accepts one parameter thing is explained in [[301 - Anonymous Functions#^702438]].

Partial application

A partial application is interesting. It works sort of like a default value for functions. First, define a function as per normal.

el tag content = "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

^8a5b8d

As for the "default value" part it looks like this.

html_ :: String -> String
html_ = el "html"

body_ :: String -> String
body_ = el "body"

When you call body_ you only need to pass in one argument. That's because the content argument is already passed in. You also don't need to specify that another remaining arugment is necessary.

Changing body_'s definition to this

body_ content = el "body" content

hls will pick it up and warn you that it can be simplified.

Diagnostics:
1. Eta reduce
   Found:
     body_ content = el "body" content
   Why not:
     body_ = el "body"
TheCedarPrince commented 1 year ago

Possible Approach

Hey @AlphabetsAlphabets , this is great! Thinking through it some, I don't think writing parser tooling for this would be too challenging. In fact, what I see is that we will need to do is to define functions that:

  1. Create functions that extract obsidian links
  2. identify the obsidian links accordingly based on how they are used
    1. Links to other notes
    2. Links to external websites or sources
  3. Extract inline footnotes
  4. Extract footnotes

These could be developed for the Markdown parser functions that we have been building here: https://github.com/TheCedarPrince/NoteMate/blob/dev/src/markdown/parser.jl

What Should NoteMate.jl Support and What Should It Enable?

And, from a philosophical standpoint, you identify a tension that I have been thinking about which is: how much should the package do for a user and how much should a user do? In my opinion, I would like the package to provide someone the ability to write their own parser for their own notes to address the point you raised:

I use footnotes for my references. Some have a dedicated section. I don't have a "notes" section either. I go straight into the content after writing all the metadata/frontmatter. So, how should I go about handling this?

Everyone takes notes differently but the model helps provide some loose guidelines to allow one to use functionalities that NoteMate.jl provides. I know that @SevorisDoe is working on a tutorial showing how one could use NoteMate.jl to build parsing for their own notes which I think is immensely helpful as it would be too huge a task for NoteMate.jl to support every single aspect of someone's workflow. I think this is a good tension to have as I, personally, cannot think of every single workflow aspect but then folks can open up issues or discussions and we can think about if a feature should be supported or not.

Transclusions

Those are the general issues as for the Obsidian specific issues it has to do with the various linking methods like [[301 - Haskell]] there are also more such as [[filename#heading]], [[filename^block containing this text]] and there are more.

This is a really fun problem! @SevorisDoe and I have been talking some about transclusions (i.e. this syntax: [[filename^block containing this text]]). I have no idea how to support it yet, but I think this might needs be a separate issue on how to generally support transclusions if we want to someday.

Final Thoughts

Going back to the Possible Approach, I think this will be the most straightforward way to go ahead and start contributing to NoteMate.jl if you are interested. We can just slowly start going through those issues and figuring out how to address things. In my opinion, 3 & 4 seem the most straightforward to start with. Let me know your thoughts @AlphabetsAlphabets ! Thanks for opening the issue; it is really good to be having these discussions as NoteMate potentially matures into a more featureful package!

SevorisDoe commented 1 year ago

I have not been working on a tutorial yes, but mine follows your processing workflow @TheCedarPrince as you laid out in your processing script.

To use my notes, takes similar capabities to what has been defined here. We need to drive NoteMate forward with note ontology capabilities, being able to exploit more consistent structures, possibly checking other notes or the note structure. The later requires resolving links, which is... something in its own right. We will have to see.

This is definitely not impossible, and we have talkend about this before, but this reinforces my impression that we need these capabilities defined and move towards implementation.

This is a really fun problem! SevorisDoe and I have been talking some about transclusions (i.e. this syntax: [[filename^block containing this text]]). I have no idea how to support it yet, but I think this might needs be a separate issue on how to generally support transclusions if we want to someday.

It actually is pretty simple to interpret insofar as that transclusions can be defined by a Regex-matched flag or similar - and then the relevant text from the other note is extracted and turned into a note.

To build this up a bit towards something that we can treat as a note ontology. The operation is an extraction, triggered by a link ontology and then executed using a note ontology (where a primitive match just extracts the paragraph. A more advanced note ontology for example might match note keywords, file paths or links, local document structure,...)

TheCedarPrince commented 1 year ago

Hey, whenever you have the time/desire @SevorisDoe , could you explain to me more about what you mean by an ontology in the discussions section? I am not fully grasping what you mean here but I want to as I think it is really powerful/useful vocabulary. Thanks!

SevorisDoe commented 1 year ago

Hey, whenever you have the time/desire @SevorisDoe , could you explain to me more about what you mean by an ontology in the discussions section? I am not fully grasping what you mean here but I want to as I think it is really powerful/useful vocabulary. Thanks!

So for me, knowledge graphs have framed the term ontology. The notion here is that an ontology is a definition of concepts, entities and their relations (on which you can compute). You could also call it a logical specification. I think this definition describs it well:

An ontology is a formal, explicit specification of a shared conceptualization that is characterized by high semantic expressiveness required for increased complexity. ― Feilmayr, Christina; Wöß, Wolfram (2016). "An analysis of ontologies and their success factors for application to business". Data & Knowledge Engineering. 101: 1–23. DOI10.1016/j.datak.2015.11.003.

In this case an ontology would be the definition of what things you find in a note, and what notes you find in a vault, what structures they have, and how those relate - i.e. OKM is a notes ontology because we can define that the bullets in the references section (defined by a heading with a certain name, until the subsequent header) each refer to some other thing. Or the three bold-starting paragraphs under the title (which is the first H1 header of the document) are the date, the summary, and the keywords, with each keyword being a hash separated by a whitespace...

In my vault for example, notes in the "50 Ressources/Zotero" folder are notes whose title matches Bibtex keys in my Zotero library. Notes that sit in my "Projects" folder are project notes, and so for example "Projects/NoteMate.md" is a project note about this effort here, and any note that links to it refers to the project. This kind of information is encoded in the ontology.

The point for us here is basically, an ontology defines labeled entitens within notes, and to the notes themselves. It creates entities which we can define to have certain properties, and which we can then target by their relations. I.e. we might use the ontology to say "any note which has resides in the Zettelkasten folder, and has a literary tag, is a literary note and thus we consider it a reference. We know that literary notes have a header where the third paragraph starting with the string "link" specifies the URL and that below that we find a "source" text, which we can copy to create the references later.

Those are specific examples. The general view is more unconstrained, but the consequence is a kind of "structured labelling" which you can use for computation and manipulation of notes by undestanding what entities are defined in them, what the note itself represents within the larger workspace, and how you should interpret links between them (see also the keyword of semantic links)

TheCedarPrince commented 1 year ago

Thanks for sharing this @SevorisDoe ! I moved our discussion about this comment to here: #25