Closed tathastu871 closed 1 year ago
ALSO ADD SUPPORT FOR CAPTURING ONLY LINKS RATHER THAN WHOLE PAGE
EG. I OPEn a site and capture links -> Execute link capture again on first group of links
Here add another button capture link that will just scrape href links from pages that are listed and thenfrom resulting href link can again be captured fot another href links
It is needed if user dont want to caprure entire pages rather just recursively scrape links
- Capture Links from clipboard/textfile/list
I don't see what "list" means. It it means like links in a web page, you can select them and invoke a Capture selected links
for that.
Other cases can be done by invoking a Capture selected links
and paste the URLs in the dialog. I don't see a plus to implement an extra command for that.
- Capture regex based links also permuation eg. https://site.com/page?=[1-9] --> will generate link for each page 1 to 9 and scrape them.This is essential if i had to scrape 100 pages and instead of writing individual url for each page
The permuated URL list can be easily generated using Excel, OpenOffice Calc, Google SpreadSheet, etc., and be applied through pasting into the dialog of Capture selected links
, as previously mentioned. I don't see a big plus to implement an extra command for that. Also it's not easy to define a good placeholder set without conflicting with a real URL.
- Capture resources --> ASYNCRONOUSLY + PARALLEL THREADING SUPPORT
I don't get this. Please provide a more detailed description about the related real word use cases.
- ADD SUPPORT FOR INJECTING JAVASCRIPTS SNIPPETS OR BOOKMARKLETS BEFORE SCRAPING EACH PAGE
EG. i had site where i had to execute a small javascript to remove certain elements and then scrape them
If scrape can take as input sequential javascript functions or bookmarklets to be executed on each page
Or on specific page based on regex. ((THE WAY TAMPERMONKEY USERSCRIPTS EXECUTE ONLY ON SITES DEFINED BY REGEX))
Unfortunately this is NOT POSSIBLE as the browser extension framework does not allow arbitrary JavaScript code execution due to a security concern, and any similar approach (such as embedding a JavaScript interpreter using JavaScript) is also explicitly forbidden by the policy of the extension store.
Some possible alternative approaches:
1) Use the capture helper. This is limited in functionality but should be able to work for many useful cases. You can request an adequate extension for that for a good real world use case.
2) Configure Tampormonkey/Userscript to do the automated programmatical web page modification when you visit a web page. (addendum: script injection is only allowed for a content script (i.e. run within the visited web page, NOT the captured web page content), and this feature will likely be removed by Manifest V3))
3) Write your own extension (or temporary extension) to do the automated programmatical web page modification and invoke a WebScrapBook capture through the external message API (an incomplete doc can be found here).
ALSO ADD SUPPORT FOR CAPTURING ONLY LINKS RATHER THAN WHOLE PAGE
EG. I OPEn a site and capture links -> Execute link capture again on first group of links
Here add another button capture link that will just scrape href links from pages that are listed and thenfrom resulting href link can again be captured fot another href links
It is needed if user dont want to caprure entire pages rather just recursively scrape links
I don't get this. What do you mean "capture only links"? If you mean capture bookmark, it can be easily achieved through the advanced mode of the Capture As
dialog.
Some error on kiwi android always Fatal error: Failed to download "WebScrapBook/data/20230517102349918/index.html": Unable to download to the folder.
Some error on kiwi android always Fatal error: Failed to download "WebScrapBook/data/20230517102349918/index.html": Unable to download to the folder.
This is an issue of Kiwi browser and we cannot really fix it, but you can bypass it through tweaking the capture options. See #295 for more details.
In the future please raise an unrelated issue in a new thread so that it can be properly traced independently.
1) Capture Links from clipboard/textfile/list 2) Capture regex based links also permuation eg. https://site.com/page?=[1-9] --> will generate link for each page 1 to 9 and scrape them.This is essential if i had to scrape 100 pages and instead of writing individual url for each page 3) Capture resources --> ASYNCRONOUSLY + PARALLEL THREADING SUPPORT 4) ADD SUPPORT FOR INJECTING JAVASCRIPTS SNIPPETS OR BOOKMARKLETS BEFORE SCRAPING EACH PAGE
EG. i had site where i had to execute a small javascript to remove certain elements and then scrape them
If scrape can take as input sequential javascript functions or bookmarklets to be executed on each page
Or on specific page based on regex. ((THE WAY TAMPERMONKEY USERSCRIPTS EXECUTE ONLY ON SITES DEFINED BY REGEX))