BookStackApp / BookStack

A platform to create documentation/wiki content built with PHP & Laravel
https://www.bookstackapp.com/
MIT License
15.46k stars 1.94k forks source link

Backup and Restore functionality #2405

Open partoneoftwo opened 3 years ago

partoneoftwo commented 3 years ago

User story As the owner of a bookstack wiki that contains a lot of information that I have spent huge amounts of time in curating. In addition this information is very important to me. Bookstack is so great and I use it for business and private purposes. Because of this heavy reliance, I want to be able to know that I can ensure that I have a way to back up the entire bookstack instance. I also need to be able to restore this backup regardless of the runtime version of the Bookstack version.

Security description This is important to strengthen the NIST dimension of RECOVER. For operational security perspective, this feature will strengthen the CIA aspect; Whenever Confidentiality, Integrity and or Availability has been impaired/breached, then this recovery functionality is a critical thing to have.

Describe the feature you'd like A backup feature where I can back up all the information and images and structure that I have entered as an end user of Bookstack.

A restore feature where I can restore all the information and images and structure that I have entered as an end user of Bookstack, which my backup container contains. It is of course critical that restore is possible without failing, regardless if a backup package was done on an older version than what is currently installed. But this is of course hard to achieve.

Describe the benefits this feature would bring to BookStack users Benefits for a user: The backup and restore feature will make it very easy to secure information which I as a user really care about.

As a user I am able to quickly be able to restore Bookstack instances, and I can do it without interacting **without touching the infrastructure / container layer***

Benefits for the product/project: Bookstack will be perceived as more reliable and a secure viable solution, for use in different scenarios where information criticality is high.

Additional context This is needed to make the product more mature. It is a highly usable feature which will make the product more attractive.

I am aware of a feature request which has been closed #43, regarding backup of Bookstack data. However it focused on backing up singular pages. This feature request is regarding the entire bookstack instance covering the following data objects:

Data

Configuration

Did I mention I'm a massive fan of this software?

ssddanbrown commented 3 years ago

Did I mention I'm a massive fan of this software?

Hi @partoneoftwo, Thanks!

This is needed to make the product more mature. It is a highly usable feature which will make the product more attractive.

You could make the maturity statement about pretty much any addition to be honest, and I'm not looking to chase maturity itself; Same with "attractiveness", These are not primary goals of the project, I'd rather focus on improving the experience for existing users which, as you've explained, this would also benefit.

To be honest, I'm aware this is an area that we're lacking in, at least in manner that's intuitive.

My main concern has always been something you requested in this line:

As a user I am able to quickly be able to restore Bookstack instances, and I can do it without interacting without touching the infrastructure / container layer*

Bringing backup mechanisms into the application layer brings a lot of risk and instability, the application layer relies on the web-server and infrastructure layers it's sat upon. We could quickly get into trouble with things like timeouts, file permissions and request/response size limits. We'd then likely end up needed a lot of configuration options to suit the different requirements and environments that BookStack may run in. I'm not saying it's not possible at all, Just that it would require some ongoing effort while increasing accessibility to backups while decreasing reliability. This is why I've guided people so far in the direction of doing backup at the infrastructure layer.

How about we instead spend some time increasing accessibility of backups at an infrastructure layer? We could start adding some example scripts to the devops bookstack repo then link to these in the docs with some guidance, to the point where someone with a common setup could just download the script, add it to cron for scheduling, then be done with it. We could then include these to be used in the install scripts and container publishes could build these in.

ssddanbrown commented 3 years ago

Related to #723

modem7 commented 3 years ago

Just to add another point of view: for those on Docker, the infrastructure layer is somewhat different, and far more controllable, so this functionality, especially if it's able to be done on a cron job would definitely be useful.

numen31337 commented 3 years ago

Hey guys, while this feature is still being developed, I can share a script I use for continuous backup performed by a server once a week/month. It uses the API and a separate read-only user to perform a full backup. The parsing part is so weird because I need it to work on macOS without non-greedy Perl-style grep.

#!/bin/bash

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN='QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1';
BASE_URL='http://192.168.10.19:6885';

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

for i in $BOOK_IDS; do
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" -o "$DATE_MONTH-$i.html"
done

Perhaps it will come in handy for someone looking for an automated solution.

Here's the full script with a bit more bells and whistles, which I personally use for basic backup, which is enough for my needs. Tested on macOS and Synology NAS Linux.

#!/bin/bash
# Does basic BookStack backup by fetching contained html for every book.
# Usage: backup.sh ~/Desktop/ "QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1" "http://192.168.10.19:6885"

if [ "$#" -ne 3 ]; then
  echo "Illegal number of parameters."
  exit 1
fi
if [[ "$1" != */ ]]; then
  echo "Enter the path to the output folder with a trailing slash."
  exit 1
fi
if ! command -v 7z &>/dev/null; then
  echo "7z is not installed. Try brew install p7zip or sudo apt install p7zip-full."
  exit 1
fi

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN=$2;
BASE_URL=$3;

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

if [ -z "$BOOK_IDS" ]; then
  echo "No book IDs received. Check access right."
  exit 1
fi

OUTPUT_TMP_DIR="$1$DATE_MONTH-backup/" # add input of output dir
mkdir "$OUTPUT_TMP_DIR"
OUTPUT_FILE="$1$DATE_MONTH.zip"

for i in $BOOK_IDS; do
  FILENAME="$OUTPUT_TMP_DIR$DATE_MONTH-$i.html"
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" --silent -o $FILENAME
done

rm -f "$OUTPUT_FILE" # Delete the archive if already exists
7z a "$OUTPUT_FILE" "$OUTPUT_TMP_DIR*" -bsp0 -bso0 # Archive silently
rm -fr "$OUTPUT_TMP_DIR" # Delete temp dir
modem7 commented 3 years ago

Hey guys, while this feature is still being developed, I can share a script I use for continuous backup performed by a server once a week/month. It uses the API and a separate read-only user to perform a full backup. The parsing part is so weird because I need it to work on macOS without non-greedy Perl-style grep.

#!/bin/bash

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN='QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1';
BASE_URL='http://192.168.10.19:6885';

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

for i in $BOOK_IDS; do
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" -o "$DATE_MONTH-$i.html"
done

Perhaps it will come in handy for someone looking for an automated solution.

Good shout!

I'd recommend changing your token (assuming it's your real one) though just in case.

aslmx commented 2 years ago

@numen31337 thanks a lot for your comments here. Not sure if it was existing already somewhere else, but i added a little jq trickery around the json and now have the bookname in the filename.

https://gist.github.com/aslmx/a0fded5c4b180b45a6bb54963a3643bf

mhjor70 commented 2 years ago

So once you have the backup how do you restore it ? Here is why i ask. I am setting up bookstack at several sites to store configuration docs for clients. Some of the "framework" of the shelves->books->pages will be the same. So rather than repeat the creation on multiple sites i would like to import the base "framework" from a master site when i start a new instance.

patbcc commented 1 year ago

I'd like to add my few cents to this issue.

We have a production system running v22.07.3. This runs in a VM. We generally rely on VM snapshots as backups. However, as with the OP we have a lot of content that would be lost should the snapshots not work when a restoral was needed. So I was testing the backup and restore method referenced in the docs (https://www.bookstackapp.com/docs/admin/backup-restore/). In doing so I ran into an issue caused by version differences. The test environment has the latest release; 23.01.1. Although importing was successful, the site fails to load due to two missing columns in the entity_permissions table (entity_type and view). I'm guessing these were added at some point after 22.07.3.

So it would appear that this type of issue is another hurdle for adding backup and restore functionality. Either the functionality would need to be able to determine the differences between versions and correct them or older releases would need to be made available so that the BookStack server could be rebuilt to the same version as the backup that was made prior to restoring then updated to the latest version (if so desired).

As a side note, the documentation should be updated to point out the issues caused by differences in the version where the backup was made and the version where the restore is occurring.

AuthorShin commented 1 year ago

@ssddanbrown Each page on BookStack got this HTML code that you can copy and paste somewhere else and have the exact same page/document (except the images), so why not do this?

Backup function can can turn shelves and books to the folders and sub-folders (since there are only one level of them it's very easy and straightforward to do so) and then there would be a .txt file that contain the HTML code of the page which later can be used for restore via GUI or manual one and automate this would be fairly easy I guess.

So let's say we got a book called "Black and White" with 12 chapters which is under "Dark" shelves the folder structure would be :

Dark (S) > Black and White (B) > chapter1 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter2 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter3 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter4 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter5 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter6 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter7 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter8 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter9 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter10 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter11 (C) > pageswiththeirtitle.txt Dark (S) > Black and White (B) > chapter12 (C) > pageswiththeirtitle.txt

And have all of these folders and files into a zip file which we can encrypt with a password as well.

AuthorShin commented 1 year ago

Any thoughts on this one?https://github.com/BookStackApp/BookStack/issues/2405#issuecomment-1661419213 @ssddanbrown

ssddanbrown commented 1 year ago

@AuthorShin That would be more of an export/import format, rather than backup/restore format, since it's quite minimal in terms of the overall related content unless you go to a lot of extra effort. If I was going to do an import/export format of some kind, it'd more likely be that following our API content structure, but that's out of scope for this issue. If you wanted something like that defined, then a couple of our API scripts come close, may be able to get what you want with a little extra tweaking.

ssddanbrown commented 1 year ago

As a general related update to this, earlier this year BookStack started including a System CLI, which can help automate tasks like backup and restore. It is in an alpha state.

dw5 commented 2 months ago

IMO A good example is what Snipe-it does. Backups content and database into a zip file, from which it can be imported back 1:1