cloverich / chronicles

A journaling hobby project
Other
1 stars 0 forks source link

Support #tagging within document text #185

Open cloverich opened 3 months ago

cloverich commented 3 months ago

When I did the work to add tagging in #126 I started with an in-document solution, but ultimately decided to kiss and use just a text box in the header, then do the rest of the integration. To some extent this implies a front matter approach like #127 -- tags would live on the markdown's title lines:

-----
title: My document title
tags: #foo, #bar, #baz
----

...rest of content

The alternative, which apps like Bear and Apple notes use, is to integrate them directly into the editor. I think this can make sense, although if I actually look at how I us that in practice, I always slap my tags at the very top of the document -- hence my decision to go with the front matter approach. Still, I implemented some of the parsing logic and want to track that here, since I may end up doing this in the future.

After fidgeting with my unified parser setup, and failing to fully grasp how I should overide it so that this works:

const slateToStringProcessor = unified()
  .use(slateToRemark)
  .use(tagsTransformer)
  .use(remarkStringify);

const stringToSlateProcessor = parser
  .use(remarkUnwrapImages)
  .use(tagsTransformer)
  .use(remarkToSlate);

Also saving this for context:

// | ........................ process ........................... |
// | .......... parse ... | ... run ... | ... stringify ..........|
//
//           +--------+                     +----------+
// Input ->- | Parser | ->- Syntax Tree ->- | Compiler | ->- Output
//           +--------+          |          +----------+
//                               X
//                               |
//                        +--------------+
//                        | Transformers |
//                        +--------------+

I ended up implementing it as a transformer instead:

export default function transformTags(tree: Root) {
  visit(tree, "text", (node: any, index, parent) => {
    const tagRegex = /#\w+/g;
    const parts = node.value.split(tagRegex);
    const matches = node.value.match(tagRegex);
    const newNodes: any = [];

    if (!matches) return;

    parts.forEach((part, i) => {
      // todo: When case is #mytag, parts will be [""] - add test
      if (part) newNodes.push({ type: "text", value: part });
      if (matches[i]) {
        newNodes.push({
          type: "tag",
          children: { type: "text", value: matches[i] },
        });
      }
    });

    parent?.children.splice(index, 1, ...newNodes);
  });

  return tree;
}

With the implied useage of:

let mdast = myunifiedparser.parse(document);
mdast =  transformTags(mdast)
return myunifiedparser.stringify(mdast) // or compile, etc

There are some useful documentations on their side

I then added supporting basic implementations to mdast-to-slate and slate-to-mdast transformers:

// mdast types

/**
 * (Custom Type) Tag - Represents a tag in markdown
 *
 * ex: #mytag
 */
export interface Tag {
  type: "tag";
  children: [Text]; // todo: probably this should actually just be a single text node?
}

// mdast-to-slate

//...

    case "tag":
      return [createTag(node)];
//...

export interface Tag {
  text: string;
  tag: true;
}

function createTag(node: mdast.Tag): Tag {
  const { type, children } = node;
  return {
    text: children[0].value,
    tag: true,
  };
}

// slate-to-mdast
type Decoration = {
  italic: true | undefined;
  bold: true | undefined;
  strikethrough: true | undefined;
  code: true | undefined;
  tag: true | undefined;
};

const DecorationMapping = {
  italic: "emphasis",
  bold: "strong",
  strikethrough: "delete",
  code: "inlineCode",
  tag: "tag",
};

convertNodes( // ...

              case "bold":
              case "italic":
              case "strikethrough":
              case "tag":
                res = {
                  type: DecorationMapping[k] as any,
                  children: [res],
                };
                break;

// ....
cloverich commented 3 months ago

I also had some WIP tests:


describe("Tags", function () {
  it.skip("text -> mdast base case", function () {
    const input = "This **text** has a #tag1 and another #tag2";
    const output = stringToMdast(input);

    expect(output).to.deep.equal({
      type: "root",
      children: [
        {
          type: "paragraph",
          depth: 1,
          children: [
            // {
            //   type: "text",
            //   value: "tag1 #tag2 #tag3",
            // },
          ],
        },
      ],
    });
  });

  // todo(chris): re-disable truncateThreshold
  chai.config.truncateThreshold = 0; // Disable truncation

  it("A basic example tag #mytag", function () {
    const input1 = "A basic example tag #mytag";
    const output = stringToMdast(input1);

    // todo: expect one paragraph child to simplify test, or maybe even
    // GET children, and throw a useful error when they aren't present...
    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "text",
        value: "A basic example tag ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it("#mytag", function () {
    const output = stringToMdast("#mytag");

    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it("#mytag1 #mytag2", function () {
    const output = stringToMdast("#mytag1 #mytag2");

    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag1",
          },
        ],
      },
      {
        type: "text",
        value: " ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag2",
          },
        ],
      },
    ]);
  });

  it("# My heading has a tag #mytag", function () {
    const output = stringToMdast("# My heading has a tag #mytag");
    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("heading");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "text",
        value: "My heading has a tag ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it.skip("`#mytag` is not a tag");
  it.skip("Tag inside a block \n ```\n #mytag\n ```");
  it.skip("Tag is child of bold inside a code block \n ```\n **#mytag**\n ```");

  describe("slate -> mdast", function () {
    it("A basic example tag #mytag", function () {
      const input: SlateNode = {
        type: "root",
        children: [
          {
            type: "p",
            children: [
              {
                text: "A basic example tag ",
              },
              {
                text: "mytag",
                tag: true,
              },
            ],
          },
        ],
      };

      const output = slateToMdast(input);

      expect(output).to.exist;
      expect(output.type).to.equal("root");
      expect(output.children).to.exist;
      expect(output.children).to.have.length(1);
      expect(output.children).to.deep.equal([
        {
          type: "paragraph",
          children: [
            {
              type: "text",
              value: "A basic example tag ",
            },
            {
              type: "tag",
              children: [
                {
                  type: "text",
                  value: "mytag",
                },
              ],
            },
          ],
        },
      ]);
    });
  });
});

I broke them when I changed the structure of the output slightly, but its straight forward to go either way, and I think they are in the right direction.