EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.14k stars 193 forks source link

Backup tree walking using yield #941

Closed gcalacoci closed 4 months ago

gcalacoci commented 5 months ago

This is much a draft.

I've added a method walk_backups_tree that given any backup info allows to walk the tree of children and act on them starting from leaves.

let's assume this backups structure where names of the nodes are backup_id values:

            root_backup
                 /   \
            child1     child2
              /           \
         child1.1       child2.1
           /     \
   child1.1.1  child1.1.2

the method starting from the root_backup, should walk down until the leaves and then return up. To put it simple, assuming to execute something like this:

for backup in backup_info.walk_backups_tree():
    print(backup.backup_id)

we should get as result:

child1.1.1
child1.1.2
child1.1
child1
child2.1
child2
root_backup

This should simplify the execution of actions like backup removal, allowing to walk bottom up the tree.

andremagui commented 5 months ago

I have tried something like

def walk_backups_tree(self, visited: List[str], target: str):
    """
    Walk through all the children backups of the current incremental backup.
    :return: a generator of LocalBackupInfo objects for each child backup.
    """
    if self.children_backup_ids:
        for child_backup_id in self.children_backup_ids:
            if child_backup_id not in visited:
                backup_info = LocalBackupInfo(
                    self.server,
                    backup_id=child_backup_id,
                )
                yield from backup_info.walk_backups_tree(visited, target)
    visited.append(self.backup_id)
    if not self.parent_backup_id or self.backup_id == target:
        return visited
    parent_backup_info = self.get_parent_backup_info()
    if parent_backup_info:
        yield parent_backup_info.walk_backups_tree(visited, target)

We need to keep track of visited nodes, so i added a visited parameter for it that should be passed along the recursion ( the visited is the result of the walk through based on Depth-first search Post Order algo). I added a target parameter just to stop the recursion at a node in the middle if this was used as the root to initiate the walk through although it is not necessary. This seems to work but would like some opinions, suggestions, critics if you have so! Also the way this would be used with other code might impact how this should be implemented.

gcalacoci commented 5 months ago

this is an interesting take on the tree-walking method. I have a question:

We need to keep track of visited nodes, so i added a visited parameter for it that should be passed along the recursion ( the visited is the result of the walk through based on Depth-first search Post Order algo). I added a target parameter just to stop the recursion at a node in the middle if this was used as the root to initiate the walk through although it is not necessary. This seems to work but would like some opinions, suggestions, critics if you have so! Also the way this would be used with other code might impact how this should be implemented.

Why do you want to keep track of the nodes? The only scenario that I can think of, where the usage of a list of visited nodes makes sense is if we have a graph like this one:

           A
         /   \
        B     C
          \  /
           D
         /   \
        E     F

But that is not possible as an incremental backup can have a single parent only. When talking of incremental backups we have a typical non-binary tree scenario, where each node has a single parent, and 0-N children like this one:

           A
         /   \
        B     C
          \ 
           D
         /   \
        E     F

(please forgive my ascii-art skills... but I think you got the idea)

Using the walk_backups_tree method on A is going to have the following outcome: walk B > walk D > wak E > YIELD E > walk F > YELD F > YELD D > YELD B > walk C > YELD C > YIELD A

Where walk means the usage of yeld from that is not visible outside of the walk_backups_tree method chain of calls.. while YIELD is what actually returns something and triggers the next step of the generator

andremagui commented 5 months ago

Actually the visited name for the variable was unfortunate! It should be backups = [] like yours there because it is the actual result of the walk through! But i got your point from the comments! I forgot about iteration of the method to get the result. This looks good