foambubble / foam

A personal knowledge management and sharing system for VSCode
https://foambubble.github.io/
Other
15.05k stars 647 forks source link

Some characters in tags are unsupported and are stripped in Foam UI elements #1240

Open redactedscribe opened 1 year ago

redactedscribe commented 1 year ago

Describe the bug

The Tag Explorer panel, Foam graph, and tag auto-completions fail to display the full tag if a defined tag contains an unsupported character.

Small Reproducible Example

No response

Steps to Reproduce the Bug or Issue

---
tags: [AI-(computing), "foo-(bar)-baz"]
---

Both of these tags are stripped of all of their remaining characters post their first hyphen (-).

Expected behavior

Support tags with characters other than [0-9A-Za-z_-] (others?) or programmatically disallow / warn about unsupported tags. If there is documentation about tags, mention the valid characters.

The example tags are assumed to be valid due to there being no Foam warning squiggles, and the YAML front matter syntax highlighting remains valid too. Only noticing the stripped characters in other areas of Foam brings awareness to a present problem.

Screenshots or Videos

No response

Operating System Version

Visual Studio Code Version

VSCodium 1.78.2

Additional context

No response

riccardoferretti commented 1 year ago

Interesting point, also because we could in theory be very lenient with tags in YAML (basically any word is a tag, no matter the content). @jimgraham as you have done some works on the tags, I would also love your opinion on the matter - thoughts?

jimgraham commented 1 year ago

I don't have a strong opinion. I'm in favour of being as permissive as possible for users.

Some naïve thoughts.

riccardoferretti commented 1 year ago

Thanks for the comment @jimgraham - I tend to agree. I was thinking that in YAML spaces or line items would be the separator, everything else is considered part of the tag.

e.g.

tags: hello[world]"with-Extra" another("tag"),with-commas final tag

or

tags: 
  - hello[world]"with-Extra" 
  - another("tag"),with-commas
  - final 
  - tag

This would result into hello[world]"with-Extra", another("tag"),with-commas, final and tag

With the list we can also support spaces:

tags:
  - final tag

This would create a final tag tag. Basically when using a list we simply consider the whole item to be the tag.

This removes any limitation from tag content (again, I don't see an issue with that, if anyone sees some please let me know)


Back to the original example

tags: [AI-(computing), "foo-(bar)-baz"]

would return [AI-(computing), and "foo-(bar)-baz"]. Clearly not the original goal I am guessing. That would be achievable with:

tags: AI-(computing) foo-(bar)-baz

or

tags: 
  - AI-(computing)
  - foo-(bar)-baz
redactedscribe commented 1 year ago

Space being the delimiter make sense on first impression since tags typically don't contain spaces in tagging systems. I only know of Logseq which supports this, but like it, I assume Foam would need some specific syntax to be able to write such tags within the Markdown itself (Logseq uses the syntax #[[foo bar]])?

The issues I see regarding the spaces-as-delimiters syntax are:

An easily typed and flexible tags key would make adding complex tags easier to less technical users, and generally require less effort for everybody, but there doesn't seem to be any additional variety in possible tag names gained compared to when using quotes and escape sequences to achieve the same (and also ensuring YAML parsers read each tag discretely).

If the goal is instead ease-of-use, then spaces-as-delimiters wins in complex tag name cases, but I think it would be overall more intuitive to just fully support YAML flow style sequences, which in their simplest form are easy enough for everybody: [tag1, "tag 2", tag3].

Flow style sequence instead of spaces-as-delimiters string:

tags: ['hello[world]"with-Extra"', 'another("tag"),with-commas', final tag, "'first char is a quote", "\"don't fix\""]

Some edge cases where quoting / escaping would still be required using a block style sequence in a spaces-as-delimiters future:

tags:
- ...
- "'first char is a quote"
- "\"don't fix\""

It would require more work on Foam's end to parse, and to auto-complete, these complex tags in flow style sequences compared to the suggested spaces-as-delimiters approach.

I guess my main question is: How important is it whether or not other YAML parsers read our tags as a single string or separate values? And is spaces-as-delimiters that much convenient / easier than using flow style sequences for simple tags.

Something to think about...

riccardoferretti commented 1 year ago

Thanks for the comment, it's actually pretty helpful to wrap my head around the issue.

Today tags in YAML are defined either as a list or as a string. When it's a list, we still do some processing around those.

To simplify the way it works I am more and more leaning on the following simple rule:

  1. If it's a string, we take the YAML string and split it with space and use those strings without further processing (no regex, no quotes, ...)
  2. If it's a list, each list item is a tag (based on the value according to the YAML syntax)

Whether it's a list or a string depends on the YAML syntax, not on Foam's parsing. If it's an invalid YAML value, it's an invalid YAML value and we don't parse it.

We could also simplify the tags in the text and just say that any word that starts with# is treated as a tag, until the next space - no further processing.

Yes, that means you can have arbitrarily complex tags - that's the user's choice. Yes, that means some tags can only be defined in YAML - that's the user's choice. Yes, that means you can't use space-separated tags in the text - I am ok with such limitation.

I don't think we should get into anything more (e.g. quotes, parenthesis, ...), things get pretty complex pretty quickly - and trying to be too smart can backfire. We give those options, users are free to use them or not.

To your comments:

Reading the YAML front matter via some other tool...

Other tools can either use the Foam metadata (if they have access to it), or implement the simple rules above. Or the user can use a list.

Possible to wrongly assume being able to use quotes or commas to create separate values

Not a terrible outcome, the user will just have to fix the tag

Going off the knowledge that space is the delimiter...

I don't believe it's Foam's job to make YAML more tolerant - the user can see the issue and act accordingly

...support YAML flow style sequences...

I agree we should follow YAML as base

Some edge cases...

If a user really really REALLY needs one of those use cases, she can figure it out, or create an issue ;)

How important is it whether or not other YAML parsers read our tags as a single string or separate values?

I am ok with saying we fully support simple YAML syntax, with the caveat that strings are divided by space

redactedscribe commented 1 year ago

To simplify the way it works I am more and more leaning on the following simple rule:

  1. If it's a string, we take the YAML string and split it with space and use those strings without further processing (no regex, no quotes, ...)

  2. If it's a list, each list item is a tag (based on the value according to the YAML syntax)

After some thought, I think what most bothers me about this approach is the lack of splitting on commas too, which may seem minor, but I think it's a valid UX argument.

Front matter tags::

Yes, that means you can't use space-separated tags in the text - I am ok with such limitation.

This is the other thing I'm not so content with as it may mean a manual refactor of tag names once this is discovered by a user through their daily use, or when a new user transition to Foam. I'd prefer some slight limitation (i.e. no spaces, only split on commas) over potentially encountering future organisational hurdles because a software's design. I'd go with a two-stage approach: Don't support spaces now, but if support for space-separated tags within Markdown text was to be implement, introduce being able to use spaces in tags everywhere at that point. However, the second stage may never come.

Because of the chance for all tags to be ignored when only one tag causes bad YAML when tags are supplied as a string, maybe it should be recommended to define tags using a list? This makes it harder to spot an error though, but I'd say keeping all other tags functional takes precedence. Edit: In fact, I think this may be a problem in both cases: Bad YAML anywhere would affect parsing the tags, right?

Other than that, I agree with your decisions.