Open LawnchairLarry opened 2 years ago
Can you provide the source URL of the web page and your capture options (through advanced dialog of the save as, or export the options).
What is the source URL of the web page?
Well isn't it what's written in the first post of mine? "https://moodle.htwsaar.de/" It's also in the last picture at the Included URLs for capturing linked pages rule except there is a backslash before the dot ( it puts there by its self when i klicked on the rule.
thx for your fast answer. Can i provide anymore information to help us?
No, your first post starts with ... and I cannot see the source URL of the web page you have captured.
I also have tested https://moodle.htwsaar.de/
and the result is normal and contains only a few pages, which is very different from yours:
Capturing (document) [32] https://moodle.htwsaar.de/ ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5036 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5050 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=5051 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=100 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=347 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=2153 ...
Capturing linked page (1) https://moodle.htwsaar.de/course/view.php?id=4615 ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=de ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=en ...
Capturing linked page (1) https://moodle.htwsaar.de/?lang=fr ...
Capturing linked page (1) https://moodle.htwsaar.de/login/index.php ...
Capturing linked page (1) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F ...
Capturing linked page (1) https://moodle.htwsaar.de/admin/tool/policy/view.php?versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F ...
Capturing linked page (1) https://moodle.htwsaar.de/?cookie-policy ...
Capturing linked page (2) https://moodle.htwsaar.de/auth/shibboleth/index.php ...
Capturing linked page (2) https://moodle.htwsaar.de/login/forgot_password.php ...
Capturing linked page (2) https://moodle.htwsaar.de/ ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Flogin%2Findex.php ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=de ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=en ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/view.php?policyid=3&versionid=3&returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2F&behalfid&manage&numpolicy&totalpolicies&lang=fr ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/index.php ...
Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fadmin%2Ftool%2Fpolicy%2Fview.php%3Fpolicyid%3D3%26amp%3Bversionid%3D3%26amp%3Breturnurl%3Dhttps%253A%252F%252Fmoodle.htwsaar.de%252F%26amp%3Bbehalfid%26amp%3Bmanage%26amp%3Bnumpolicy%26amp%3Btotalpolicies ...
Rebuilding links...
Saving data...
Saved to "**********\WebScrapBook\data\20220119122127837.htz"
Done.
Were you really attempted to capture https://moodle.htwsaar.de/
? If not, please provide the original URL of the web page you attempted to capture.
Please also provide the name and version of your OS, browser, and WebScrapBook.
If you have other extensions installed, please also try disabling all other extensions (better restarting the browser afterwards) and performing a capture with the same options.
Please also try a capture with the same options except depth = 1.
Please also try a capture with the same options except Save captured data as: Folder
.
Ah i think i know what you mean. The site i wanted to capture is https://moodle.htwsaar.de/course/view.php?id=719 (Its a C++ Tutorial from our scool platform moodle. Of cours it is behind a login but that seems not a problem as long i am logged in.) And i wanted to download the complete course to have it offline inclusve all kinds of .pdf .txt .cpp data and so on so i quess i have to go to deepth (3) or more to get it all. Because it takes a lot of time and i don't need to capture links that linked out of the course i includet that /^https://moodle\.htwsaar\.de// rule as you can see in the last picture. Is that all right so?
My OS is Win10, actual Firefox browser and actuel WebScrapBook
Ah i think i know what you mean. The site i wanted to capture is https://moodle.htwsaar.de/course/view.php?id=719 (Its a C++ Tutorial from our scool platform moodle. Of cours it is behind a login but that seems not a problem as long i am logged in.) And i wanted to download the complete course to have it offline inclusve all kinds of .pdf .txt .cpp data and so on so i quess i have to go to deepth (3) or more to get it all. Because it takes a lot of time and i don't need to capture links that linked out of the course i includet that /^https://moodle.htwsaar.de// rule as you can see in the last picture. Is that all right so?
Currently it doesn't seem like your configuration is wrong. Unfortunately we cannot reproduce the problem, and thus we need you to provide more information for further investigation.
My OS is Win10, actual Firefox browser and actuel WebScrapBook
Please provide the version of your Firefox and WebScrapBook.
Please complete the tests mentioned above and report whether the same issues persists in each case: with all other extensions disabled, with depth set to 1, and with save to folder.
Firefox 96.0.1 (64-Bit) and WebScrapBook 1.1.0
test is running with Save captured data as: Folder.
I hope i can still browse to that downloaded sit then as usual if i save it this way?
Well that looks much much better, exept of a few .pdf files he could not store becauese there is ans 'Ü' in "Übungsstunden" it probably works very well. I post the full log at the end. It would be greate if you can add special characters like Ää Üü and Öö. In fact it loaded not that only course but the whole moodle platform because i added the "same dite" argument and not the "same directory" argument wich was nessecary because some of the needet files are out of that directory.
I can work with you so far thank you so much! You get full Stars for your work. Is there a posibility to spend you a coffee? ;)
Thank you for the report. Unfortunately we still can't locate the cause of the zip failure. It may be due to some content in the website that is login-protected, and we need you to perform other tests (the "all other extensions disabled" and the "depth set to 1") to confirm it. You can also check whether this issue can be reproduced on another public website.
As for the saving error issue, it seems that the source web site has provided a header with incorrectly encoded chars in the filename, which is treated as having control chars. There is also a bug in WSB causing some control chars not correctly stripped out and causes a saving error, which will be fixed in the next release.
... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=de ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=en ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=80115&lang=fr ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&forceview=1 ... Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Furl%2Fview.php%3Fid%3D80115 ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=de ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=en ... Capturing linked page (2) https://moodle.htwsaar.de/mod/url/view.php?id=135815&lang=fr ... Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&forceview=1 ... Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Furl%2Fview.php%3Fid%3D135815 ... Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=de ... Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=en ... Capturing linked page (2) https://moodle.htwsaar.de/mod/folder/view.php?id=127753&lang=fr ... Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmod%2Ffolder%2Fview.php%3Fid%3D127753 ... Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=de ... Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=en ... Capturing linked page (2) https://moodle.htwsaar.de/message/index.php?lang=fr ... Capturing linked page (2) https://moodle.htwsaar.de/admin/tool/policy/viewall.php?returnurl=https%3A%2F%2Fmoodle.htwsaar.de%2Fmessage%2Findex.php ... Rebuilding links... Saving data... Fatal error: Bug : can't construct the Blob.