cybersemics / em

A beautiful, minimalistic note-taking app for personal sensemaking.
Other
278 stars 102 forks source link

Import Markdown #867

Closed raineorshine closed 1 month ago

raineorshine commented 3 years ago

Convert markdown to nested JSON blocks, then use importJSON. Or convert markdown to HTML, then use importText.

Options for the markdown conversion:

Steps to Reproduce:

Paste the following markdown:

# a
x

## b
### c
y

# d
# e
## f

- g
  - h
    - i

Expected Behavior

- a
  - x
  - b
    - c
      - y
- d
- e
  - f
    - g
      - h
        - i
anmolarora1 commented 3 years ago

@raineorshine I think markdown-it is a better candidate because it's ~20% lesser in size than the markdown, and provides more configuration options in general. Besides, it's being actively maintained and has got more stars.

That said, can you please share a basic example of how a fully parsed markdown should look like on em, and any special cases that you can think of?

raineorshine commented 3 years ago

I think markdown-it is a better candidate because it's ~20% lesser in size than the markdown, and provides more configuration options in general. Besides, it's being actively maintained and has got more stars.

Great! That works for me. I don't consider file size to be relevant but I value active maintenance.

That said, can you please share a basic example of how a fully parsed markdown should look like on em, and any special cases that you can think of?

I added an example to the OP. I'll leave the special cases up to you to think through.

raineorshine commented 3 years ago

FWIW I attempted a markdown to nested JSON converter a few months back, but didn't really get very far. If one of the existing converters works then I say we go with that.

https://github.com/cybersemics/md2nestedjson

anmolarora1 commented 3 years ago

FWIW I attempted a markdown to nested JSON converter a few months back, but didn't really get very far. If one of the existing converters works then I say we go with that.

https://github.com/cybersemics/md2nestedjson

I didn't quite get that, do you mean to say that we should probably consider skipping on the idea of markdown to HTML conversion, and see if we can convert to JSON directly?

raineorshine commented 3 years ago

Other way around: we should try to use one of the existing markdown converters, and if that doesn't work we'll have to write our own.

anmolarora1 commented 3 years ago

FWIW I attempted a markdown to nested JSON converter a few months back, but didn't really get very far. If one of the existing converters works then I say we go with that.

https://github.com/cybersemics/md2nestedjson

Do you remember the issues that you faced with a convertor library back then? That might help us save some time.

raineorshine commented 3 years ago

I haven't tried a converter library; I started with a custom library for fun.

anmolarora1 commented 3 years ago

@raineorshine I checked out a few libraries including the markdown-it, and none of them parses in the format that we need. So we'll probably need to go with your idea of creating a custom markdown parser. However, it's good to list down our specifications/requirements, as in what types of markers do we need to parse because there are simply too many of them. Besides, that'd help us plan the scope of this task.

I found this spec sheet that we can use as a reference. If you can suggest a better alternative, you're welcome to do that.

raineorshine commented 3 years ago

Thanks Anmol. What are your thoughts about MD → JSON Blocks → JSON? We know that existing parsers do not parse in the format we need, but since em can convert the HTML output of those parsers, how difficult would it be to convert himalayan JSON blocks into the correct nested format?

I don't think we want to write a new markdown parser itself. What we're doing is more like post-processing.

anmolarora1 commented 3 years ago

Thanks Anmol. What are your thoughts about MD → JSON Blocks → JSON? We know that existing parsers do not parse in the format we need, but since em can convert the HTML output of those parsers, how difficult would it be to convert himalayan JSON blocks into the correct nested format?

That's doable. Though converting himalyan JSON blocks into our desired format poses the same challenges IMO. For instance, the markdown

# a
x

## b
### c
y

# d
# e
## f

- g
  - h
    - i

is parsed into

<h1>a</h1>
<p>x</p>
<h2>b</h2>
<h3>c</h3>
<p>y</p>
<h1>d</h1>
<h1>e</h1>
<h2>f</h2>
<ul>
<li>g
<ul>
<li>h
<ul>
<li>i</li>
</ul>
</li>
</ul>
</li>
</ul>

Though em does convert HTML correctly, once we have the HTML, the nesting related information is already lost because the parser doesn't treat it as nested.

Please add if I'm missing anything here.

raineorshine commented 3 years ago

Good point. So we should really extend convertHTMLtoJSON to support h1-h6 elements.

anmolarora1 commented 3 years ago

It's more than just supporting the h1-h6 elements. As mentioned before, the interpretation of the HTML (obtained from parsing the Markdown) isn't correct. And I believe we don't want to tweak the HTML -> Thoughts conversion bit for now

For instance,

<h1>a</h1>
<p>x</p>

gets converted to

- a
- x

Expected:

- a
  - x
raineorshine commented 3 years ago

Yes, exactly.