danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
903 stars 120 forks source link

Experience of using ScrapBook #208

Closed Everest8 closed 2 years ago

Everest8 commented 3 years ago

I would like to share my experience of using ScrapBook. I have been using it on a daily basis since 2008, and, as of today, have about 12,000 pages saved. I am still using ScrapBook X because I needed smooth operation of this critically important tool after FireFox dramatically changed and a similarly dramatic overhaul of the program was, accordingly, necessary. I am planning to switch to the new version of the program in the near future. Here are my observations of the old program that may be useful in polishing the new one.

First off, I must say that ScrapBook X has been remarkably stable over the whole time. I never had to use the complete rebuild of the index option available in the program. Occasionally, there were very few invalid pages found by the diagnostic tool built in the "Calculation of Size" function, but they were very easy to fix.

The search engine is marvelous. Extremely accurate and fast. I use the "Comments" field to put in my keywords (a separate field for that would be helpful), but, once you develop a strict policy on your keywords and follow it consistently, you have the contents of all these thousands of pages at your fingertips. Instantly. Almost too good to be true.

It may seem that a strict keyword policy is not necessary as there is a full-text search option. Wrong. It will work satisfactorily while you have dozens of saved pages, but when the count runs into thousands, the heap of pages filtered out by the full-text search will be at least very hard to use most of the time. Practically useless.

I also strongly recommend to be careful and accurate in organizing your captured pages into the hierarchical tree. Search as such has limitations, even with a strict policy and controlled vocabulary. Sometimes you will need to see captured pages in a particular subject area which may be hard to collect completely with a search. The multilevel system of folders that the program provides makes it possible to create a very highly organized body of information. My hierarchy tree is five or more levels deep in some places. The Notes feature, which you can put to any folder, allows you to put reminders on your policy to keep it in order. Something like "Only pages related to A should be stored here. Similar pages also related to B and C should be stored in Folders D and E." A feature of crosslinks within the tree would also be helpful to facilitate a jump into the right place.

The feature of text markings is also very helpful, and it is good that, at some point, the number of available colors was increased from 4 to 8. Again, with a strict policy on how the colors are used, you are able to easily navigate within a page not matter when it was saved, yesterday or a decade ago.

I realize that, with my dozen thousand pages, I am pushing the envelope. The program probably was not designed for such amount of information. But, on the other hand, Internet is the main source of information for the observable future, and what we get from there needs to be reliably stored to be easily used. This need will persist, and the volume of information will grow. So, it is very desirable for the new version of the program to be really scalable. With ScrapBook X, I started to feel some limitations. While it works fine on my main workstation with 128 G of RAM, it stopped working reliably on my laptop with just 16 G. There are occasional problems with the tree, and Calculation of Size always stops at the 9669th record.

Even the file system of the main workstation is reeling under the load of the stored files, despite its 16-core processor, very sizable SSDs for the system and the data, and a lot of spinning drive space. It works, but there are significant delays in the system response. The total number of the files stored for these 12,000 pages is close to 900,000. If you have a mirror in your system, it is close to two millions. Also, transfer of this amount if information over a network takes a lot of time. Therefore, the new options of storage in the new version of the program, which will compress the data into much fewer files is very welcome. What needs to be kept in mind, though, is that it must be, first, equally reliable, and, second, could be, in principle, usable with programs other than ScrapBook. The current system with numerous files is quite open; the pages can be opened with any browser. Something like that is needed for the newer systems. Although I sincerely wish ScrapBook a very long and happy life, nothing is eternal in this world, and chances are that some of stored archives will outlive the period when the program is supported.

So, in summary, I believe that ScrapBook in all its versions is a really marvelous program, one of the very best in my whole 35-year-long experience with computers. The need for such programs will grow as the amount of information pumped out of Internet grows so rapidly. In fact, I think that we are very lucky in that, at this point, we already have such a highly developed, sophisticated and sufficiently tested tool to deal handle this situation. Many thanks to Danny! He has done a really great job.

danny0838 commented 3 years ago

Thank you for the feedback. It would be even better if you would try WebScrapBook and provide further feedback. We have implemented many new good features in WebScrapBook, as stated in the Diffs page, though some good old features may have been missing or discarded, and we need feedbacks to know about that.

kantrol commented 3 years ago

@Everest8 I am nearly in the same boat, long time user of Scrapbook and ScrapbookX. I did the conversion to webscrapbook more than a year ago and never had any complaints - it is running rock solid (@danny0838 where can I donate?). Running with about 10.000 pages (19 Gb data) on a 10 year old 16G RAM system off a spinning harddisk (no SSD). The big advantage is the PyWebScrapBook backend: thus I have full access from everywhere in my network. Using the Firefox extension on the Android allows saving and reading even on the mobile. I still have a system available running ScrapbookX for capturing the hierarchy of multi page documents, and then import the result into webscrapbook. While writing this comment, I noticed that a recent update brought in the option "in-depth capture", which makes the use of ScrapbooxX obsolete. ( need to learn regular expressions)

Everest8 commented 3 years ago

Kantrol,

Thank you. I tried WebScrapBook about 1.5 years ago. At that time, my biggest problem with it was that it seemed not to allow for hierarchical organization, which, in my opinion, is critically important. With a large volume of content, you cannot live with just keyword search. Now, with your claim that ScrapBookX is obsolete and your mention of the hierarchy, does it mean that there is a capability of hierarchical organization? I am not talking about multipage documents, I am talking about organization of my 12+ K stored independent captured pages in an hierarchical tree.

On Sat, May 15, 2021 at 12:07 PM kantrol @.***> wrote:

@Everest8 https://github.com/Everest8 I am nearly in the same boat, long time user of Scrapbook and ScrapbookX. I did the conversion to webscrapbook more than a year ago and never had any complaints - it is running rock solid @.*** https://github.com/danny0838 where can I donate?). Running with about 10.000 pages (19 Gb data) on a 10 year old 16G RAM system off a spinning harddisk (no SSD). The big advantage is the PyWebScrapBook backend: thus I have full access from everywhere in my network. Using the Firefox extension on the Android allows saving and reading even on the mobile. I still have a system available running ScrapbookX for capturing the hierarchy of multi page documents, and then import the result into webscrapbook. While writing this comment, I noticed that a recent update brought in the option "in-depth capture", which makes the use of ScrapbooxX obsolete. ( need to learn regular expressions)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/danny0838/webscrapbook/issues/208#issuecomment-841685326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQ3DI7PQ44XOAYJA2BIRYLTN2L23ANCNFSM4YYC4JKA .

danny0838 commented 3 years ago

@Everest8 Sidebar organization has been implemented in March 2019, and should has been largely improved in around May 2020. The only drawback is requirement of a backend server and sometimes running the CLI. This should have been stated clearly in all our documentations.

You should be able to find instructions about migrating legacy scrapbooks and setting up the backend server in the documentation wiki.

kantrol commented 3 years ago

@Everest8

The complete hierarchy which had been created with ScrapbookX was imported. Otherwise I would not find my way through the documents though the search capabilities are very powerful. It is more than just keyword search.

danny0838 commented 2 years ago

Close issue as there's no active discussion. For more information about migration from ScrapBook X, consult #153 and issues in milestone v1.0

Everest8 commented 4 months ago

Danny,

I have finally retired and will now upgrade from my ScrapBook X Version 1.14.7. I have used the program since 2008; currently, there are 15,000+ pages saved in that old format.

Are the instructions at https://github.com/danny0838/webscrapbook/wiki/Intro and links from there the most recent and complete available?

Also, I have Python installed at the time when I tried to install WebScrapBook in 2019. Do I need to reinstall it?

Thank you. Vitaly

On Mon, Mar 8, 2021 at 12:08 PM Danny Lin @.***> wrote:

Thank you for the feedback. It would be even better if you would try WebScrapBook and provide further feedback. We have implemented many new good features in WebScrapBook, as stated in the Diffs https://github.com/danny0838/webscrapbook/wiki/Diffs page, though some good old features may have been missing or discarded, and we need feedbacks to know about that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/danny0838/webscrapbook/issues/208#issuecomment-792908058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQ3DI6SXD5WBGPFVNZLENLTCUABBANCNFSM4YYC4JKA .

danny0838 commented 4 months ago

Danny, I have finally retired and will now upgrade from my ScrapBook X Version 1.14.7. I have used the program since 2008; currently, there are 15,000+ pages saved in that old format. Are the instructions at https://github.com/danny0838/webscrapbook/wiki/Intro and links from there the most recent and complete available?

We updates the documentation as long as there is a related change or known issue of our extension or related platform/applications.

Also, I have Python installed at the time when I tried to install WebScrapBook in 2019. Do I need to reinstall it? Thank you. Vitaly

It depends on the actual version you have installed. There is a basic version compatibility check and you will get an error message if it's too old to be compatible. But anyway it's generally recommended to update Python and related packages to the latest available version.

Everest8 commented 4 months ago

Thank you.

Vitaly

On Sun, Jun 16, 2024 at 1:58 AM Danny Lin @.***> wrote:

Danny, I have finally retired and will now upgrade from my ScrapBook X Version 1.14.7. I have used the program since 2008; currently, there are 15,000+ pages saved in that old format. Are the instructions at https://github.com/danny0838/webscrapbook/wiki/Intro and links from there the most recent and complete available?

We updates the documentation as long as there is a related change or known issue of our extension or related platform/applications.

Also, I have Python installed at the time when I tried to install WebScrapBook in 2019. Do I need to reinstall it? Thank you. Vitaly

It depends on the actual version you have installed. There is a basic version compatibility check and you will get an error message if it's too old to be compatible. But anyway it's generally recommended to update Python and related packages to the latest available version.

— Reply to this email directly, view it on GitHub https://github.com/danny0838/webscrapbook/issues/208#issuecomment-2171065274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQ3DI5FTAP7WQUTMZFETEDZHUSRVAVCNFSM6AAAAABJL3PVNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGA3DKMRXGQ . You are receiving this because you were mentioned.Message ID: @.***>