[Feature Request] Regex to select text

rapatel0 commented 2 years ago

The range specification and the line specification have limited expressiveness when selecting text from embed.

It would be ideal to specify a regex or set of regexes to apply to the text and select out the information from the embed

Specific Example

FILENAME

Nothing cool up here
something cool below
- Things are awesome
something unrelated
more unrelated

The following query

file: [[FILENAME]]
ranges: "something cool below" to "something unrelated"
display: embedded
join: "- "

Will output to:

something cool below

Things are awesome

something unrelated

instead of:

Things are awesome

Ideally, we can subselect the bullets we want. (also bullet parsing functionality would generally be awesome as well)

rapatel0 commented 2 years ago

Alternatively, you can create a regex post processor that allows some degree of filtering.

erykwalder commented 2 years ago

So, if I'm understanding, you basically want to be able to delimit what gets included in the reference, so that it will capture future updates in a flexible way? Because if it's static, you could just use ranges: "Things are awesome"

I think regex would fall into a similar issue to what's currently there in that you'd have to distinguish between what's included in the quote and what's simply used to delimit it.

Anyways, my thoughts are that maybe another range type could be added. Maybe something like between "string" and "string".

join: "- "

P.S. - the join text is only added if there are multiple ranges, and only between the ranges, not as a prefix.

rapatel0 commented 2 years ago

I think regex would fall into a similar issue to what's currently there in that you'd have to distinguish between what's included in the quote and what's simply used to delimit it.

I was thinking about a regex operator with capture parentheses (see Capturing). Continuing with the example the following regex:

regex: "something cool below"(.*)"something unrelated"

should capture

   - Things are awesome

Because if it's static, you could just use

It's dynamic. It might be easier If I explain the full use case. I have a set of daily notes that I capture. Then on a weekly and monthly basis, I embed the section with all the bullets in a big end-of-week list but it's polluted. Ideally I would like to slice by parent bullet and list child bullets. The number of child bullets would be dynamic as well.

Now that we are talking about it. There is something else that is a "problem." Even with a regex, the leading whitespace would be annoying to remove. If you have multiple lines then you wouldn't be able to remove the whitespace with regex.

Example:

- something cool below
   - Things are awesome
   - Things are awesome 2
- Something unrelated

with:

regex: "something cool below"\s*(.*)"something unrelated"

would yield

- Things are awesome
   - Things are awesome 2

This is why I think a regex post-processor might make sense. A regex to remove text from the initial embed.

erykwalder commented 2 years ago

Let me stew on this for a bit and come up with a proposal, now that I'm following the use case.

erykwalder commented 2 years ago

So right now I'm leaning towards being able to select a bullet in the path line with something like [[File#-Parent Bullet]]. Then from there, you could set the range of what you want to include, since by default it would include the bullet itself. I'm still thinking about what syntax to use to indicate a bullet that couldn't (or likely wouldn't) get mixed up with a heading involving those characters.

In addition, I'm thinking of adding an option to the line:col range of end, so for selecting all bullets after the parent, you could do: ranges: 1:0 to end.

For now, I'm trying to hold off on post processing due to the complexity. Maybe at some point there could be something simple like being able to put in a js function that gets eval'd and called for each range or the assembled quote.

rapatel0 commented 2 years ago

Cool, this seems functional. I share your concerns with the hash. Things would get really complicated if you have multiple headings. Technically you would lose a bit of expressiveness if you just use a hash.

Imagine this note:

# Heading 1
-Parent Bullet
  - Thing that i may want at some other time
# Heading 2
-Parent Bullet
  - Thing I want

Note: my exact use case would be satisfied by the hash approach, but just proposing this from a "API" design POV

Also it might be something to consider at the blockid level like in the readme:

path: [[Filename#Heading#^blockid]]

erykwalder commented 2 years ago

You should now be able to accomplish what you were trying with:

```quoth
path: [[file#-something cool below]
ranges: after 1:0
```

erykwalder / quoth

[Feature Request] Regex to select text #6

Specific Example