danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
908 stars 121 forks source link

Importing data from ScrapBook X #14

Closed Atanacius closed 7 years ago

Atanacius commented 7 years ago

It is more a question than an issue indeed, I'm thinking (and I think I'm not the only one who thought that..), to the best process to recover partially (it will be awesome if it will be fully !) all our Scrapbook X/Scrapbook Plus/Scrapbook base data and then making them working in WebScrapbook ?

By recovering, I mean recover:

Since it is actually not a database (not only I mean):

The actual important files of all Scrapbook X versions (including old formers Scrapbook Plus/Scrapbook base) are stored in some physicals folders here:

Web pages saved and their assets: C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\data

Scrapbook X (.rdf) structure/organization/treeview Auto Backup: C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\backup

Scrapbook X (.rdf) used structure/organization/treeview (when viewing in Firefox): C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\scrapbook.rfd

Scrapbook X (.rdf) cache: C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\cache.rdf

Scrapbook X (.txt) Folders main id: C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\folders.txt

Scrapbook X (.txt) Folders main id: C:\Users\{windowsUsername}\AppData\Roaming\Mozilla\Firefox\Profiles\{firefoxSessionName}\ScrapBook\folders.txt

So all thoses files that are important to Scrapbook X to restoring/importing/making of use of our existing content in Scrapbook X to WebScrapbook..

I use Scrapbook for .. years now... (and I'm not the only one I think) who love organizing Scrapbook data like I want ! My treeview is perfectly sorted! I will be sad if all my hard work on sorting/organizing/treeview work would be gone....

Thanks by advance for your answer @danny0838. I know it's not an easy task to making Scrapbook X working in Firefox 57 under the limitation (absurd) of WebExtension...

danny0838 commented 7 years ago

Currently you can use ScrapBook X Converter to convert the data from ScrapBook X to archive formats.

As for the sidebar tree, it is quite a challenge since WebExtension does not allow file system access. Writing an additional app which runs a local server may be a workaround, but it is still very complicated and may be unable to work on Firefox Android.

A more viable option is to add an indexing functionality for Web ScrapBook. That is, the user drops the Web ScrapBook directory into the viewer page, and the containing folder and files will be analyzed and a zip file containing index page, metadata, and fulltext cache can then be downloaded (and then be extracted to the Web ScrapBook directory). It requires some manual operation but can be achieved by WebExtension alone. We may implement this feature in the near future if other options are not viable enough.

We are still evaluating what the data scheme Web ScrapBook should be. Once determined, we will write a tool to convert a ScrapBook X data folder to Web ScrapBook, probably via adding a feature to ScrapBook X Converter.

Atanacius commented 7 years ago

@danny0838 It is not possible to use ScrapBook X Converter in Firefox 57 :/ So, a more viable solution to import data will be so useful ! Anyway, if it'll be not any other viable solutions than using ScrapBook X Converter, the archive formats will be import-ready for WebScrapbook ? Or.. It's lost forever :( ?

Thanks for your answer on Sidebar, have a nice day and ... GOOD LUCK! Do not give up, we trust in you!

danny0838 commented 7 years ago

The indexing feature as mentioned above is now available in version 0.17.0. Welcome to test on it.

In summary, open the site indexer page from the dropdown list of Web ScrapBook icon, and then select a Web ScrapBook directory to generate site index. Once completed, a zip package containing the generated files will be ready for download. You can then extract files to the Web ScrapBook directory, and open the map.html or frame.html, and the tree is there!

To simplify the procedure, you can activate the auto download option, which allows direct downloading when you submit the configured Web ScrapBook directory for indexing.

Importing legacy ScrapBook data is supported. Just pick the ScrapBook directory for indexing.

Currently we haven't implemented a good GUI for modifying metadata and the tree (table of contents), but you can directly modify the JSON data in generated meta.js and toc.js.

This feature is currently in experimental. Although there is a built-in backup mechanism for any changed file, it is still wise to make an adequate backup before trying.

JeremiahBullfrog commented 7 years ago

I attempted to generate a site index with the data from Scrapbook X in Firefox Quantum. It identified 7474 items in 161 folders and then sits on the 'Generate site index'. I left it alone for 3 hours and it had the same message without any indication of progress. I'm hoping for a solution because at least with this option I can still easily identify and access my captures from the previous Scrapbook add-on.

I removed all but one folder and attempted to generate the site index again and had the same result.

To test it out with the present format, I did capture a file with the new Web Scrapbook and then indexed it and it worked flawlessly. Any help would be appreciated.

danny0838 commented 7 years ago

@JeremiahBullfrog Could you provide the exact log shown on the screen or a screenshot of it? (You can mask private information if there's a concern)

JeremiahBullfrog commented 7 years ago

screen shot 2017-11-15 at 7 15 13 pm This is what got when I tried it with only one folder. After that, the screen went blank.

danny0838 commented 7 years ago

@JeremiahBullfrog In the above test, did you get the zip file downloaded? Did it show a Done. in the last line?

Please also provide the screenshot for the case in issue, that is, the case or importing your 7xxx files.

JeremiahBullfrog commented 7 years ago

It never finished. There was no zip file, it never showed done. I'm not sure what you mean by the "case in issue". I didn't successfully import anything. I attempted to, but there was no batch import that I could find and I didn't want to go in to every folder as that would take a very long time.

I moved the previous scrapbook folder to the new one that was created named WebScrapbook, attempted to index it and had the result mentioned above.

danny0838 commented 7 years ago

Could you try using a new profile and with no other addon?

mikeT12 commented 7 years ago

am I the only one that can't figure out what this new tool is good for or how to use it?

I used scrapbook X. I saw a web page, I saved it, I could go back to my saved list and see the page again the way I saw it. Kinda like a digital photocopier. Neat.

I tried this new tool after installing it. Saved a few pages. Then selected "view archived page". No list of files I had saved. Tried generating a site index as discussed above; just an extra step needed, right? I get Webscrapbook.zip which just has a bunch of folders in it, no clue of what they are.

Then I tried indexing my old Scrapbook. Generated another zip file. I have no idea what I'm supposed to do with that. I guess I can see why there are only 458 users. When I get my degree in CS from Stanford maybe I can become user 459, otherwise I have no idea what this tool does or how I could use it.

danny0838 commented 7 years ago

@mikeT12 The site indexer is to build the list of captured pages. If your ScrapBook folder is set to WebScrapBook and you get WebScrapBook.zip after running the indexer, you can unzip the content of WebScrapBook.zip to the WebScrapBook folder, and then open WebScrapBook/tree/map.html or WebScrapBook/tree/frame.html, there should be the list of pages. You can further edit WebScrapBook/tree/toc.js and WebScrapBook/tree/meta.js manually to modify the page list (and optionally run the indexer again to check for a potential error).

This addon is very young and is still under development. We currently have to focus on getting a precise web page capture and a good archive file, and then a good conversion between those formats, and then we'd be able to deal with the Web Extension restriction and handle the sidebar and filesystem integration things. This may take some time, unfortunately.

danny0838 commented 7 years ago

@JeremiahBullfrog I recently identified the issue of getting a blank screen when downloading the final zip on a Firefox 55, it seems to be an issue of Firefox core but the precise condition for it to happen is not quite clear. Anyway, we did some change to prevent its happening for now (with a little side effect, though). This should not happen on 0.18.3. You can have a try.

JeremiahBullfrog commented 7 years ago

Although it took a bit to understand the new format, I felt your initial instructions were quite clear. I did create a new profile and also updated to the most recent version of WebScrapbook. In regards to creating an index, everything worked flawlessly.

At some point I do hope the files/directories could more easily be organized but for now at least I have access to everything again. Thank you very much for your efforts. This is an add-on I would prefer not to live without.

danny0838 commented 7 years ago

We close this issue since we can basically import the metadata and tree structure from a ScrapBook X folder via the site indexer.

As for other data @Atanacius mentioned in 1F:

nandudb commented 8 months ago

Hi Danny,

FIrst thing first, Thanks so much for the scrapbook X, I had been using for long, but unfortunately, unable to use it firefox,waterfox. I just tried on Pale Moon, but unable to link to my old scrapbook directory(on external drive). getting following error

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIRDFService.GetDataSourceBlocking]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://scrapbook-modules/common.jsm :: getRDFDataSource :: line 415" data: no]

I really appreciate if you could help me use my old scrapbook data and also continue capture new with side bar tree structure. The sidebar tree structure is major feature that i liked to keep track of all captured pages to revisit offline. Unfortunately, this feature is not working in Webscrapbook :(

danny0838 commented 8 months ago

@nandudb To uase the database created with legacy ScrapBook X you don't need to really run ScrapBook X, but need to first convert the database to compatible format. See doc wiki for details.