christianspecht / scm-backup

Makes offline backups of your cloud hosted source code repositories
https://scm-backup.org/
GNU General Public License v3.0
61 stars 20 forks source link

Detect repos which were deleted from the hoster? #39

Closed christianspecht closed 2 years ago

christianspecht commented 4 years ago

Repos that already exist in the backup folder, but not in the data from the current API call

--> those must have existed once, but were deleted from the hoster

Do we need this?
Everybody has to decide for themselves what to do with them (delete from backup dir? restore frm backup dir to hoster?), but at least getting a list of them would be nice.

ghost commented 3 years ago

I have a problem related to this.

I wanted to add a notification for such "events", to have a little safety check for deletion.

I realised that the ongoing backup seems to keep a copy of the previously retrieved repository (or any manually created files) but as the modified date of the existing / unchanged repos is not changed it's not possible to discriminate between the two case (except by doing a backup from a empty folder witch is obviously more costly).

A simple exemple:

The duplicated repo is still there and the unchanged repo have the same dates so there are no trivial way to see that this deleted repo actually was deleted from source scm (in Bitbucket for me).

Another problem I would see with that is that creating another repo with the same name would then delete this repo. My initial intention was actually to keep a copy of deleted repository in an extra archive folder (with a unique file name / timesamp) so as not to be too dependant on old tape backup in cas we dont see the issue in time.

I actually hacked something in bash with ls output and wdiff to compare list before and after backup and send an extra notification if changes are detected,. It would probably make more sense to add that in scm-backup (that could then use the api to valide current existing repo).

Maybe there could be a way to hack a curl call to the api considering I already use the environment variables for the password. I'll check that.

christianspecht commented 3 years ago

I implemented this in a branch, it's not yet merged into master though.

You can try this build: https://ci.appveyor.com/project/ChristianSpecht/scm-backup/builds/40978722/artifacts It will be auto-deleted in a month, but until then I have probably made a proper release.

By default SCM Backup only detects those repos, but there's an option in the config file (settings.yml) to change this behavior, so it actually deletes them.

ghost commented 3 years ago

I was supposed to come back to this problem for our integration needs so that's good news!

I will install it locally and try to give you a feedback asap.

Thanks!

Eric

ghost commented 3 years ago

Works fine for me!

I will be able to fix the script I had done without having to rewrite it in Python using Bibucket API (wich would be kind of done in double of scm-backup).

First pass:

Collect list of files in folder (before_listing). Run scm-backup ... Info Git: https://bitbucket.org/some_user/repo1.git Info Git: https://bitbucket.org/some_user/repo2.git Info Git: https://bitbucket.org/some_user/repo3.git ... (email notification of scm-backup) Collect list of files in folder (after_listing). Calculate delta and extra email notify to team for add / remove (or rename) (I could ignore first pass as I would have no easy way, except using api, to know when repository where added) "rotation"

Collect list of files in folder (before_listing). Do scmbackup Info Git: https://bitbucket.org/some_user/repo1.git Info Git: https://bitbucket.org/some_user/repo3.git -- .. (warn repo 2 deleted) -- Delete folder ... (email notification of scm-backup) Collect list of files in folder (after_listing). Calculate delta and extra email notify to team for add / remove (or rename) "rotation"

scm-backup notification is perfect for execution of the backup in itself (daily) but I would like to use the available information to inform the team at large of new / deleted repository (more in the aim of team knowledge sharing than backup itself) because it's not a feature offered in Bibucket cloud (unless I just missed it) (so the email would be seen more as an exception than a daily one).

Also, because I keep a "rotation" (5 days), that will give us the possibilty to react to delete error / vandalism without having to go to the network backup to analyse / fix the incident if we need to revover the repository.

In our last incident, we actually detected a problem quite late so I was also considering moving the deleted repository in a separated archive folder ("live" repositories would be rotated on a five days basis by exemple but deleted repository would be kept for a longer time time frame).

Having all the team notified of deletion is a good enough solution, we should be able then to check if we need to do something special (extra archival) but with the listings I should be able to implement such behavior now.

In my first attemp, I was using wdiff to compare the before_listing and after listing I had something like that (that could be done in scm-backup obviously). Comment are in french but you should get the idea.

I'm not sure my scenario is common enough to add it to scm-backup but I could maybe attempt to do a pull request if you think it could make sense.

#################################################
# Analyse différence: comparaison des listings et notification si différences détectées.
#################################################

wdiff --no-common --statistics $temp_file_ongoing_repository_list_after_synchro $temp_file_known_repository_list_before_synchro > $diff_result

# Utilise une caractéristique du format de statistiques de wdiff. Si 100% commmon apparait 2 fois, il n'y pas de différences.
count100percentcommon=$(grep -o '100% common' $diff_result | wc -l)

# Décision: envoyer notification pour cas d'exceptions seulement.
if [ $count100percentcommon -ne 2 ] ; then
    echo "il y a des différences dans la liste de dépôts. Envois d'une notification supplémentaire."
    echo "$(cat $diff_result)" | mail -s "Backup bitbucket notification : Détection de changements dans la liste des dépôts sur Bitbucket lors du backup quotidien." "someone@somewhere"
else
    echo "Aucune différences détectées dans la liste de dépôts. Aucune notification envoyée."
fi
########
christianspecht commented 2 years ago

Sounds useful! Feel free to contribute a pull request for this.

Probably not everybody will want this, but you can make the output optional via config value
(example is in the commits linked above: config file / code)

ghost commented 2 years ago

I will give it a try! It won't be quick as I will likely do it in my personal time. I do think it would make more sense than the extra scripts have now.