c0fec0de / anytree

Python tree data library
Apache License 2.0
940 stars 130 forks source link

Why IndentedStringImporter is gone ? #105

Open regexgit opened 4 years ago

regexgit commented 4 years ago

About one year ago I had IndentedStringImporter installed with anytree. Now after a reinstallation of the OS and all my tools I realize that it is no longer present.

I use it a lot, for example biological taxonomies or import/export of photo tools hierarchical keywords are "controlled vocabulary" indented text files.

Fortunately I kept a backup of the code but but I would prefer an official installation.

c0fec0de commented 4 years ago

It was no official implementation. Just on a branch. I will double check.

als0052 commented 3 years ago

Just to add to the convo here I'm also interested in seeing the IndentedStringImporter (perhaps also an IndentedStringExporter?) added in. I read the initial feature request and tried looking for the source code by my github-foo is not very good.

LionKimbro commented 3 years ago

I was just thinking about implementing an indented string importer, something that would read:

Foo
  Bar
  Baz
    Boz
    Bitz
  Blah

...and construct a tree with just that.

If I implement such a thing in a branch, is there any chance that it would be accepted? Is the project taking contributions?

LionKimbro commented 3 years ago

I've created a pull request with an implementation of the functionality, docstring documentation, and nose tests.

angely-dev commented 1 year ago

Any updates on this?

regexgit commented 1 year ago

It seems not. If it can help you while waiting for an official version: I still use the original version (file indentedstringimporter.py of 2019) which I carefully kept. No warranty of course but for my needs it's enough.

# -*- coding: utf-8 -*-
from anytree import AnyNode

#---------------------------------------
def _get_indentation(line):
    # Split string using version without indentation
    # First item of result is the indentation itself.
    content = line.lstrip(' ')
    indentation_length = len(line.split(content)[0])
    return indentation_length, content

#*******************************************************************************
class IndentedStringImporter(object):

    def __init__(self, nodecls=AnyNode):
        u"""
        Import Tree from a single string (with all the lines) or list of strings
        (lines) with indentation.

        Every indented line is converted to an instance of `nodecls`. The string
        (without indentation) found on the lines are set as the respective node name.

        This importer do not constrain indented data to have a definite number of
        whitespaces (multiple of any number). Nodes are considered child of a
        parent simply if its indentation is bigger than its parent.

        This means that the tree can have siblings with different indentations,
        as long as the siblings indentations are bigger than the respective parent
        (but not necessarily the same considering each other).

        Keyword Args:
            nodecls: class used for nodes.

        Example using a string list:
        >>> from anytree.importer import IndentedStringImporter
        >>> from anytree import RenderTree
        >>> importer = IndentedStringImporter()
        >>> lines = [
        ...    'Node1',
        ...    'Node2',
        ...    '    Node3',
        ...    'Node5',
        ...    '    Node6',
        ...    '        Node7',
        ...    '    Node8',
        ...    '        Node9',
        ...    '      Node10',
        ...    '    Node11',
        ...    '  Node12',
        ...    'Node13',
        ...]
        >>> root = importer.import_(lines)
        >>> print(RenderTree(root))
        AnyNode(name='root')
        ├── AnyNode(name='Node1')
        ├── AnyNode(name='Node2')
        │   └── AnyNode(name='Node3')
        ├── AnyNode(name='Node5')
        │   ├── AnyNode(name='Node6')
        │   │   └── AnyNode(name='Node7')
        │   ├── AnyNode(name='Node8')
        │   │   ├── AnyNode(name='Node9')
        │   │   └── AnyNode(name='Node10')
        │   ├── AnyNode(name='Node11')
        │   └── AnyNode(name='Node12')
        └── AnyNode(name='Node13')
        Example using a string:
        >>> string = "Node1\n  Node2\n  Node3\n    Node4"
        >>> root = importer.import_(string)
        >>> print(RenderTree(root))
         AnyNode(name='root')
        └── AnyNode(name='Node1')
            ├── AnyNode(name='Node2')
            └── AnyNode(name='Node3')
                └── AnyNode(name='Node4')
        """

        self.nodecls = nodecls

    #------------------------------------
    def _tree_from_indented_str(self, data):
        if isinstance(data, str):
            lines = data.splitlines()
        else:
            lines = data
        root = self.nodecls(name="root")
        indentations = {}
        for line in lines:
            cur_indent, name = _get_indentation(line)

            if len(indentations) == 0:
                parent = root
            elif cur_indent not in indentations:
                # parent is the next lower indentation
                keys = [key for key in indentations.keys()
                          if key < cur_indent]
                parent = indentations[max(keys)]
            else:
                # current line uses the parent of the last line
                # with same indentation
                # and replaces it as the last line with this given indentation
                parent = indentations[cur_indent].parent

            indentations[cur_indent] = self.nodecls(name=name, parent=parent)

            # delete all higher indentations
            keys = [key for key in indentations.keys() if key > cur_indent]
            for key in keys:
                indentations.pop(key)
        return root

    #------------------------------------
    def import_(self, data):
        # data: single string or a list of lines
        return self._tree_from_indented_str(data)
angely-dev commented 1 year ago

Thanks @regexgit for pointing out the original version, yet I ended up doing my own and lightweight implementation. It converts an indented config (not text, strictly speaking, since I assume each line to be unique per indented blocks) to an n-ary tree using raw nested dicts.

The goal was to compare (and merge) two config files whilst being aware of the indented blocks scope. Unlike anytree, it won't meet everyone's requirements but if anyone is interested: text to tree conversion in 10 lines of code and an example. I also published a simple gist.

lverweijen commented 1 year ago

I would also be interested in this.

I actually created my own version. It wasn't written for anytree (but can probably easily be changed) and it may not be very flexible or fault-tolerant, but it should be reasonably fast for correct input:

    def from_indented_file(file, indent='@'):  # Change to "    " if 4 spaces are desired
        # Each line consists of indent and code
        pattern = re.compile(rf"^(?P<prefix>({re.escape(indent)})*)(?P<code>.*)")

        root = Node()
        stack = [root]

        for line in file:
            match = pattern.match(line)
            prefix, code = match['prefix'], match['code']
            depth = len(prefix) // len(indent)
            parent_node = stack[depth]
            node = parent_node.add(code)  # Should probably change to node = Node(parent=parent_node)

            # Place node as last item on index depth + 1
            del stack[depth + 1:]
            stack.append(node)

   return root

If a pull request is accepted, maybe the best parts of all three implementations can be combined. I would also like to have an export to an indented file with the same options.