blacksmithgu / obsidian-dataview

A data index and query language over Markdown files, for https://obsidian.md/.
https://blacksmithgu.github.io/obsidian-dataview/
MIT License
6.94k stars 411 forks source link

Bug report: task implicit `tags` field detects erroneous items that follow `#` symbol but are not tags, `file.tags` correctly follows Obsidian behavior on same items #1286

Open AnnaKornfeldSimpson opened 2 years ago

AnnaKornfeldSimpson commented 2 years ago

What happened?

The implicit tags field on Tasks incorrectly parses some items that follow a # symbol but are not tags, such as the color in <mark style="background: #FFF3A3A6;">, as tags, in operations such as GROUP BY tags. Obsidian does not recognize these items as tags, so dataview should not either. In fact, if using GROUP BY file.tags instead, the behavior is correct and no erroneous tags are invented, so the parsing error is specific to the implicit tags field on tasks. Does file.tags use some Obsidian API for getting tags that a task's tags field does not?

Found thanks to @claremacrae investigation of a similar issue in the Tasks plugin: https://github.com/obsidian-tasks-group/obsidian-tasks/issues/929 Related: #1268 seems to find the same issue with dataviewjs.

DQL

Test case (found by @claremacrae on obsidian-tasks, wrapped in codeblock to display correctly on Github, remove the codeblock when copying)

- [ ] <mark style="background: #FFF3A3A6;">Make phone call 10:50 today</mark>
- [ ] [Enable preview feature #678](https://github.com/obsidian-tasks-group/obsidian-tasks)
- [ ] [Internal: Better abstraction for grouping of tasks · Discussion #920 · obsidian-tasks-group/obsidian-tasks · GitHub](https://github.com/obsidian-tasks-group/obsidian-tasks/discussions/920)

Incorrect results (test case above is grouped by erroneous items following # that are not tags)

TASK
FROM "path/To/Test/File"
GROUP BY tags

Correct results (all in same group, no tags)

TASK
FROM "path/To/Test/File"
GROUP BY file.tags

JS

/* https://github.com/blacksmithgu/obsidian-dataview/issues/1268 Issue # 1268 seems to have repro this in JS also. I did not test JS. */

Dataview Version

0.5.41

Obsidian Version

0.15.8

OS

Windows

AnnaKornfeldSimpson commented 2 years ago

Possible fix: change data-import/common.ts#L5 regex to require either start of line or at least one white-space in front of the tag? EDIT: I do not have time to exhaustively test this and submit a PR, but hopefully this will help someone who does! EDIT 2: Is it not possible to just get the tags from Obsidian?

AB1908 commented 2 years ago

It is possible to get tags from Obsidian but not the way it can be extracted for each task.

AnnaKornfeldSimpson commented 2 years ago

@AB1908 said:

It is possible to get tags from Obsidian but not the way it can be extracted for each task.

Could you say a little more please about why extracting for each task does not work? I thought there was a way to convert from an Obsidian "Pos" to a line number? (though I cannot find in Obsidian API right now) CacheItems have a field with "Pos".

AB1908 commented 2 years ago

My assumption here is that blacksmithgu is using some regex to parse the task text and also extract tags. The Obsidian metadataCache does give us pos attributes (didn't check for tags though) but then it would require correlating with the task metadata. Now that I think through it, it does seem like it should work but I imagine it was somewhat easier to throw a tag matcher in an existing task text parser.

AnnaKornfeldSimpson commented 2 years ago

My assumption here is that blacksmithgu is using some regex to parse the task text and also extract tags. The Obsidian metadataCache does give us pos attributes (didn't check for tags though) but then it would require correlating with the task metadata. Now that I think through it, it does seem like it should work but I imagine it was somewhat easier to throw a tag matcher in an existing task text parser.

Yes it is a regex currently at data-import/common.ts#L5 to extract tags. Your logic makes sense, or at least it is easier until you hit test cases like the HTML highlight one above.

blacksmithgu commented 2 years ago

It was indeed easier to do per-task matching then to traverse the tags array for positions. Will see if I can swap to using the Obsidian cache instead.

claremacrae commented 2 years ago

It was indeed easier to do per-task matching then to traverse the tags array for positions. Will see if I can swap to using the Obsidian cache instead.

Hi @blacksmithgu - just to suggest a possibly easier alternative that I think would give a pretty good result.

If you can pass file.tags in to the code that parses the task line, then that code could intersect the tags it found in the line with the tags in the file, and only keep in the Task the tags that are in both lists.

There could be some very rare cases where a tag elsewhere in the file matches a not-tag in the line, but speaking personally, in the Tasks plugin, I would take a simple fix like this, that's easy to write and review and release, over a rewrite at this point. So I wanted to suggest it to you too.

alicangok commented 2 years ago

Was anyone able to write a quick fix that we can deploy manually?

P.S. I can confirm that the bug is also seen when using "Link to headings", such as:

- [ ] Rewrite this section: [[Article#Section]]

claremacrae commented 2 years ago

Funnily enough I am working on exactly this right now in Tasks.

When I'm happy with it, I will share the information back here.