danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
908 stars 121 forks source link

A regex that used to work with scrapbook-x doesn't work with webscrapbook #366

Closed grotesque closed 10 months ago

grotesque commented 1 year ago

Sample text ~!~happy~!~

I want to write a regex that'll do substring search between two ~!~

With scrapbook-x I used to do (~!~).*(.*app.*).*(~!~) and it used to work.

This doesn't work for webscrapbook re:(~!~).*(.*app.*).*(~!~)

What am I doing wrong?

danny0838 commented 1 year ago

re: is a switching command and a keyword used with it will be ignored. You should separate the command and the keyword with space(s), i.e. re: (~!~).*(.*app.*).*(~!~).

Legacy ScrapBook X still accepts the keyword used with a command that doesn't need a keyword or an unsupported command. This can cause a confusion and WebScrapBook no more does that.

P.S. Be ware that (~!~).*(.*app.*).*(~!~) is a bad regex pattern that can cause a catastrophic backtracking. Do not really use it.

grotesque commented 1 year ago

Thanks. It works with the added space.

I didn't understand the meaning of switching command or keyword. Maybe you can elaborate. I think documentation should explicitly mention the space being needed to run regular expressions.

And maybe we should make it work without the space like scrapbook-x does it. A small thing like this will very likely lead to new people concluding that regex doesn't work here.

danny0838 commented 1 year ago

The documentation should be clear about this. A command like mc: and re: affects all following normal keywords, and thus need not and should not be used with a keyword. Other commands are used together with a keyword instead. Things like re:keyword have never been documented (even in legacy ScrapBook X), and their behavior are never guaranteed.