alarmz / boar

Automatically exported from code.google.com/p/boar
0 stars 0 forks source link

Boar should have an option for purging old revisions #13

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Unlike a normal revision control software that deal mostly with text files, 
boar is designed for large multimedia files or other types of binary files such 
as MSOffice documents and other proprietary formats. While we do want to keep 
some revisions (e.g. for important documents) , we normally don't want to keep 
ALL revision, especially when major reorganization of repositories occur. For 
instance, we may want to move a large directory, says 100gb out of the 
repository, and we're mostly certainly sure that we don't ever want it back, 
there's no point in keeping that directory in the history of the repository.

So with this reasoning, I propose a purge command that removes all history from 
a certain revision and before. Boar should be able to make it as if a certain 
revision in the middle is the initial import.

Original issue reported on code.google.com by uts...@gmail.com on 13 Mar 2011 at 9:22

GoogleCodeExporter commented 9 years ago
Agreed. This is a desirable feature.

There are some things that complicate the implementation though. The default 
behaviour is that new snapshots uses a previous snapshot as a template, and 
only specifies what files have changed. This makes it almost free to make small 
changes to a session. But then you cannot remove an earlier snapshot, since 
later snapshots depend on it. And rewriting the snapshot definition files will 
violate the "write-once" paradigm that restricts the chances of bugs causing 
repository corruption. Alternatively, one could set the repository so that 
every snapshot is independent. This causes a big overhead if you only modify a 
few files in every commit, but removing snapshots is then trivial. One could of 
course write a new snapshot definition, concurrent with the old one, and keep 
both, assuming that it is only the bulk file data that we want to purge. Just 
some thoughts. In any case, this needs to be implemented.

Original comment by ekb...@gmail.com on 15 Mar 2011 at 11:00

GoogleCodeExporter commented 9 years ago
I'm inclined to the last solution, that is if we can somehow superimpose a 
patch definition, which works like a mask over a certain snapshot, and all 
following snapshots only applies to the area defined by the mask.

Another solution I thought of was to allow independent snapshots and dependent 
snapshots together, so that we retain the storage optimization advantage of 
linked snapshots, but also achieve the flexibility of independent snapshots.

Original comment by uts...@gmail.com on 16 Mar 2011 at 2:24

GoogleCodeExporter commented 9 years ago
I have started implementing this feature. The suggested solution is that only 
the latest revision of a session is preserved. Essentially, it will be possible 
to "truncate" a session so that it only contains the data present in the most 
recent snapshot. 

Original comment by ekb...@gmail.com on 24 Feb 2012 at 1:26

GoogleCodeExporter commented 9 years ago
Feature added as of changeset 2616bf61610d.

There is now a "truncate" command that removes old snapshots from a session. 

As a safety device, an empty file named "ENABLE_PERMANENT_ERASE" must be 
present in the top directory in the repository to enable the truncate command. 
This file must be created manually. Without this file, "truncate" will not 
function (nor will any of the lower level functions supporting the operation).

It is not yet possible to only remove selected snapshots, it's all or nothing. 
Afterwards, the session will contain only whatever was in the last snapshot. 
The "truncate" operation will replicate to clones if using the "clone" command. 
The "ENABLE_PERMANENT_ERASE" file must be created manually in each clone before 
cloning.

All the deleted session data, and all the deleted blobs, are moved to the tmp/ 
directory in the repository. They are placed in directories with the prefix 
"TRASH_" and a random suffix. You need to delete these directories manually if 
you want to free up space and delete the data permanently.

This feature is reasonably well tested. As other Boar operations, it can be 
safely aborted and resumed. Still, if you chose to activate and use this 
feature in your Boar repository, you are making a compromise with data safety. 
Even if Boar was blessed with divine perfection, "truncate" still makes it 
possible to lose data if misused. Be careful.

Original comment by ekb...@gmail.com on 22 Apr 2012 at 8:10