danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
894 stars 119 forks source link

Using regex doesn't capture linked webpages #339

Closed Notnilc2107 closed 1 year ago

Notnilc2107 commented 1 year ago

An example webpage that I want to capture:

https://blackboard.qut.edu.au/webapps/blackboard/content/listContent.jsp?course_id=_164929_1&content_id=_9748967_1

I want to capture every linked page that has a different course_id and content_id. The regex input I put in the included webpages is:

/^https://blackboard\.qut\.edu\.au/webapps/blackboard/content/listContent\.jsp\?course_id\=

It works fine when I manually put in every linked page, but not when I use that regex input. I think it might be because of the ? after listContent.jsp but I'm not sure. I've tried using regex with with https://demo.cyotek.com/ and it works there. The regex input I used for that was /^https://demo.cyotek.com/

danny0838 commented 1 year ago

The regex input I put in the included webpages is:

/^https://blackboard.qut.edu.au/webapps/blackboard/content/listContent.jsp\?course_id\=

As the tooltip says, a regex string must comply with the format /<regex>/<flags>. As what you input here has no ending /, it will be treated as matching the exact URL /^https://blackboard\.qut\.edu\.au/webapps/blackboard/content/listContent\.jsp\?course_id\=, which won't match any real world URL.

Fixing the format to something like /^https://blackboard\.qut\.edu\.au/webapps/blackboard/content/listContent\.jsp\?course_id\=/ should get it work. If still not, please provide the exact capture options (which can be obtained from Capture as by copying the JSON from advanced mode) for further investigation.

Notnilc2107 commented 1 year ago

It worked. Thanks for answering my rookie question. This is an amazing browser extension by the way, I'd send a tip on ko-fi or buymeacoffee if i could.