c0fec0de / anytree

Python tree data library
Apache License 2.0
937 stars 130 forks source link

Feature Request - Node Tree from list of delimited strings #153

Open als0052 opened 3 years ago

als0052 commented 3 years ago

I've got some files I'm trying to parse into a tree (a FEM assembly tree actually) and need to take a list of lists and create a tree form it. Because of the way I get the raw input that I'm parsing (TCL script) all the lists have the full path, much of which is repeated in each list. I'm sure I could get this working eventually without a 'batch node creator' but I wanted to throw this out there anyways for consideration that such a batch node creator be added in the future.

Below is a (hopefully understandable) minimum example copied and pasted from a markdown export of a Jupyter notebook. I think this request might be similar to others made previously and if so feel free to close this one and/or link it to other issues.

Sorry for the book-length post!!


from anytree import Node, RenderTree
from pathlib import Path

Make the Tree Manually w/ anytree

a1 = Node('Assembly 1', parent=None)
a1_sa1 = Node('Sub Assembly 1', parent=a1)
a1_sa1_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa1)
a1_sa1_ssa1_sssa1 = Node('Sub Sub Sub Assembly 1', parent=a1_sa1_ssa1)
a1_sa1_ssa1_sssa1.children = [Node('Component 1'), Node('Component 2')]

a1_sa2 = Node('Sub Assembly 2', parent=a1)
a1_sa2.children = [Node('Component 1'), Node('Component 2'), Node('Component 3'),
                   Node('Component 4'), Node('Component 5'), Node('Component 6'),
                   Node('Component 7'), Node('Component 8'), Node('Component 9'),
                   Node('Component 10'), Node('Component 11'), Node('Component 12'),
                   Node('Component 13'), Node('Component 14'), Node('Component 15'),
                   Node('Component 16')]

a1_sa3 = Node('Sub Assembly 3', parent=a1)
a1_sa3_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa3)
a1_sa3_ssa1_c1 = Node('Component 1', parent=a1_sa3_ssa1)

a1_sa3_ssa2 = Node('Sub Sub Assembly 2', parent=a1_sa3)
a1_sa3_ssa2_c1 = Node('Component 1', parent=a1_sa3_ssa2)

a1_sa3_ssa3 = Node('Sub Sub Assembly 3', parent=a1_sa3)
a1_sa3_ssa3_c1 = Node('Component 1', parent=a1_sa3_ssa3)
a1_sa3_c1 = Node('Component 1', parent=a1_sa3)
a1_sa3_c2 = Node('Component 2', parent=a1_sa3)
a1_sa3_c3 = Node('Component 3', parent=a1_sa3)
a1_sa3_c4 = Node('Component 4', parent=a1_sa3)

a1_sa3_ssa4 = Node('Sub Sub Assembly 4', parent=a1_sa3)
a1_sa3_ssa4_c1 = Node('Component 1', parent=a1_sa3_ssa4)

a1_sa3_ssa5 = Node('Sub Sub Assembly 5', parent=a1_sa3)
a1_sa3_ssa5_c1 = Node('Component 1', parent=a1_sa3_ssa5)

a1_sa3_ssa6 = Node('Sub Sub Assembly 6', parent=a1_sa3)
a1_sa3_ssa6_c1 = Node('Component 1', parent=a1_sa3_ssa6)

a1_sa3_ssa7 = Node('Sub Sub Assembly 7', parent=a1_sa3)
a1_sa3_ssa7_c1 = Node('Component 1', parent=a1_sa3_ssa7)

a1_sa3_ssa8 = Node('Sub Sub Assembly 8', parent=a1_sa3)
a1_sa3_ssa8_c1 = Node('Component 1', parent=a1_sa3_ssa8)

a1_sa3_ssa9 = Node('Sub Sub Assembly 9', parent=a1_sa3)
a1_sa3_ssa9_c1 = Node('Component 1', parent=a1_sa3_ssa9)

a1_sa3_ssa10 = Node('Sub Sub Assembly 10', parent=a1_sa3)
a1_sa3_ssa10_c1 = Node('Component 1', parent=a1_sa3_ssa10)
for pre, fill, node in RenderTree(a1):
    print(f'{pre}{node.name}')
Assembly 1
├── Sub Assembly 1
│   └── Sub Sub Assembly 1
│       └── Sub Sub Sub Assembly 1
│           ├── Component 1
│           └── Component 2
├── Sub Assembly 2
│   ├── Component 1
│   ├── Component 2
│   ├── Component 3
│   ├── Component 4
│   ├── Component 5
│   ├── Component 6
│   ├── Component 7
│   ├── Component 8
│   ├── Component 9
│   ├── Component 10
│   ├── Component 11
│   ├── Component 12
│   ├── Component 13
│   ├── Component 14
│   ├── Component 15
│   └── Component 16
└── Sub Assembly 3
    ├── Sub Sub Assembly 1
    │   └── Component 1
    ├── Sub Sub Assembly 2
    │   └── Component 1
    ├── Sub Sub Assembly 3
    │   └── Component 1
    ├── Component 1
    ├── Component 2
    ├── Component 3
    ├── Component 4
    ├── Sub Sub Assembly 4
    │   └── Component 1
    ├── Sub Sub Assembly 5
    │   └── Component 1
    ├── Sub Sub Assembly 6
    │   └── Component 1
    ├── Sub Sub Assembly 7
    │   └── Component 1
    ├── Sub Sub Assembly 8
    │   └── Component 1
    ├── Sub Sub Assembly 9
    │   └── Component 1
    └── Sub Sub Assembly 10
        └── Component 1

Parse the TCL Script Output File

Looks something like this (~ delimited; contents pasted here so as to not include ExampleTree_featureRequest_raw.txt):

~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component2
~Assembly1~SubAssembly2~Component1
~Assembly1~SubAssembly2~Component2
~Assembly1~SubAssembly2~Component3
~Assembly1~SubAssembly2~Component4
~Assembly1~SubAssembly2~Component5
~Assembly1~SubAssembly2~Component6
~Assembly1~SubAssembly2~Component7
~Assembly1~SubAssembly2~Component8
~Assembly1~SubAssembly2~Component9
~Assembly1~SubAssembly2~Component10
~Assembly1~SubAssembly2~Component11
~Assembly1~SubAssembly2~Component12
~Assembly1~SubAssembly2~Component13
~Assembly1~SubAssembly2~Component14
~Assembly1~SubAssembly2~Component15
~Assembly1~SubAssembly2~Component16
~Assembly1~SubAssembly3~SubSubAssembly1~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly2~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly2~Component1
~Assembly1~SubAssembly3~SubSubAssembly4~Component1
~Assembly1~SubAssembly3~SubSubAssembly4~Component2
~Assembly1~SubAssembly3~SubSubAssembly4~Component3
~Assembly1~SubAssembly3~SubSubAssembly4~Component4
~Assembly1~SubAssembly3~SubSubAssembly4~Component1
~Assembly1~SubAssembly3~SubSubAssembly5~Component1
~Assembly1~SubAssembly3~SubSubAssembly6~Component1
~Assembly1~SubAssembly3~SubSubAssembly7~Component1
~Assembly1~SubAssembly3~SubSubAssembly8~Component1
~Assembly1~SubAssembly3~SubSubAssembly9~Component1
~Assembly1~SubAssembly3~SubSubAssembly10~Component1

# Output file from the TCL script
raw_tcl_output = Path().cwd().joinpath('ExampleTree_featureRequest_raw.txt')

Read in the raw TCL Output File

with open(raw_tcl_output, 'r') as fin:
    content = fin.readlines()

Parse the content into list of lists

# Strip newlines, split into lists
content = [c.strip('\n') for c in content]
content = [c.split('~') for c in content]
display(content[0])

# Take all but the first after splitting to remove blank at beginning
# I guess that blank (i.e. ``content[0][0]``) is like the root?
content = [c[1:] for c in content]
['',
 'Assembly1',
 'SubAssembly1',
 'SubSubAssembly1',
 'SubSubSubAssembly1',
 'Component1']
content[0]
['Assembly1',
 'SubAssembly1',
 'SubSubAssembly1',
 'SubSubSubAssembly1',
 'Component1']
content
[['Assembly1',
  'SubAssembly1',
  'SubSubAssembly1',
  'SubSubSubAssembly1',
  'Component1'],
 ['Assembly1',
  'SubAssembly1',
  'SubSubAssembly1',
  'SubSubSubAssembly1',
  'Component2'],
 ['Assembly1', 'SubAssembly2', 'Component1'],
 ['Assembly1', 'SubAssembly2', 'Component2'],
 ['Assembly1', 'SubAssembly2', 'Component3'],
 ['Assembly1', 'SubAssembly2', 'Component4'],
 ['Assembly1', 'SubAssembly2', 'Component5'],
 ['Assembly1', 'SubAssembly2', 'Component6'],
 ['Assembly1', 'SubAssembly2', 'Component7'],
 ['Assembly1', 'SubAssembly2', 'Component8'],
 ['Assembly1', 'SubAssembly2', 'Component9'],
 ['Assembly1', 'SubAssembly2', 'Component10'],
 ['Assembly1', 'SubAssembly2', 'Component11'],
 ['Assembly1', 'SubAssembly2', 'Component12'],
 ['Assembly1', 'SubAssembly2', 'Component13'],
 ['Assembly1', 'SubAssembly2', 'Component14'],
 ['Assembly1', 'SubAssembly2', 'Component15'],
 ['Assembly1', 'SubAssembly2', 'Component16'],
 ['Assembly1',
  'SubAssembly3',
  'SubSubAssembly1',
  'SubSubSubAssembly1',
  'Component1'],
 ['Assembly1',
  'SubAssembly3',
  'SubSubAssembly2',
  'SubSubSubAssembly1',
  'Component1'],
 ['Assembly1',
  'SubAssembly3',
  'SubSubAssembly3',
  'SubSubSubAssembly1',
  'Component1'],
 ['Assembly1',
  'SubAssembly3',
  'SubSubAssembly3',
  'SubSubSubAssembly2',
  'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component2'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component3'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component4'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly5', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly6', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly7', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly8', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly9', 'Component1'],
 ['Assembly1', 'SubAssembly3', 'SubSubAssembly10', 'Component1']]

Desired Feature

A way to 'batch create' a node tree. Some command that will take in a list of delimited node-childNodes-etc. and create a valid Node object from it.

Example:

>>> list1 = ['Assembly1', 'SubAssembly1', 'SubSubAssembly1', 'SubSubSubAssembly1', 'Component1']
>>> node_from_list1 = SomeNewBatchCreateNodeFunction(input_list=list1)

The above should produce the same result as doing it by hand:

a1 = Node('Assembly 1', parent=None)
a1_sa1 = Node('Sub Assembly 1', parent=a1)
a1_sa1_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa1)
a1_sa1_ssa1_sssa1 = Node('Sub Sub Sub Assembly 1', parent=a1_sa1_ssa1)
a1_sa1_ssa1_sssa1.children = [Node('Component 1')]

Related Issues

Open

Closed

Semi-Related Issues

Open

Closed

jkbgbr commented 2 years ago

You mean like this?

from anytree import Node, findall_by_attr, RenderTree

lines = ['~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component1',
         '~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component2',
         '~Assembly1~SubAssembly2~Component1',
         '~Assembly1~SubAssembly2~Component2',
         '~Assembly1~SubAssembly2~Component3',
         '~Assembly1~SubAssembly2~Component4',
         '~Assembly1~SubAssembly2~Component5',
         '~Assembly1~SubAssembly2~Component6',
         '~Assembly1~SubAssembly2~Component7',
         '~Assembly1~SubAssembly2~Component8',
         '~Assembly1~SubAssembly2~Component9',
         '~Assembly1~SubAssembly2~Component10',
         '~Assembly1~SubAssembly2~Component11',
         '~Assembly1~SubAssembly2~Component12',
         '~Assembly1~SubAssembly2~Component13',
         '~Assembly1~SubAssembly2~Component14',
         '~Assembly1~SubAssembly2~Component15',
         '~Assembly1~SubAssembly2~Component16'
         '~Assembly1~SubAssembly3~SubSubAssembly1~SubSubSubAssembly1~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly2~SubSubSubAssembly1~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly1~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly2~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly4~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly4~Component2',
         '~Assembly1~SubAssembly3~SubSubAssembly4~Component3',
         '~Assembly1~SubAssembly3~SubSubAssembly4~Component4',
         '~Assembly1~SubAssembly3~SubSubAssembly4~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly5~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly6~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly7~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly8~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly9~Component1',
         '~Assembly1~SubAssembly3~SubSubAssembly10~Component1', ]

def from_assembly_line(root: Node = None, line: str = ''):

    nodenames = [x for x in line.split('~') if x]  # removing empty items

    # root node
    if root is None:
        root = Node(nodenames[0], parent=None)

    # iterating from the second element
    for nodeind, nodename in enumerate(nodenames[1:]):
        parent_candidate = findall_by_attr(node=root, value=nodenames[nodeind])
        # todo check len(parent_candidate) > 0
        if not findall_by_attr(node=parent_candidate[0], value=nodename):
            Node(nodename, parent=parent_candidate[0])

    return root

if __name__ == '__main__':
    _root = None
    for _line in lines:
        _root = from_assembly_line(root=_root, line=_line)

    print(RenderTree(_root))
als0052 commented 2 years ago

That looks like it'll work. I think I long ago found a work around to my issue above but I was hoping that this could become a more easily used feature in future releases. That way you don't have to write your own function to do it, even if it is a pretty simple (in hindsight) function.

lverweijen commented 1 year ago

Something more general that I would like would be a function to turn a list like this:

l = [('Europe", "Italy", "Rome"),
     ('Europe", "Italy", "Milan"),
     ('Europe", "France", "Paris")]

into a tree:

Europe
 Italy
  Rome
  Milan
 France
  Paris

The reason is that I have many hierarchies stored in csv files and with pandas and such a function I can easily convert them to trees. It would also solve your problem, because you can just split each of your strings.

lverweijen commented 1 year ago

Here are sample implementations for my ideas above:

def from_rows(rows, node_factory=anytree.Node, root_name="root"):
    created_nodes = {}

    root = node_factory(root_name)
    for row in rows:
        parent_node = root
        for depth, col in enumerate(row):
            if (depth, col) in created_nodes:
                node = created_nodes[depth, col]
            else:
                node = node_factory(col)
                node.parent = parent_node
                created_nodes[depth, col] = node

            parent_node = node

    return root

def to_rows(root, str_factory=str, skip_root=True):
    index = 1 if skip_root else 0
    for leaf in root.leaves:
        yield [str_factory(node) for node in leaf.path[index:]]

Update 2024-01-06: I implemented this in littletree using functions Node.from_rows, Node.to_rows.