DavidArno / SuccincT

Discriminated unions, pattern matching and partial applications for C#
MIT License
267 stars 15 forks source link

Feature request: pattern mathcing and transformation on object trees #40

Open Opiumtm opened 7 years ago

Opiumtm commented 7 years ago

I am using functional approach already in the code which analyze and transform tree-like hierarchical collections (it's transformation of source HTML DOM into Word document paragraphs or internal markup-like representation to display it later on screen).

Naturally, pattern matching is the solution for this class of problems. There are two basic functional algorithms I use:

  1. "Subtree template" pattern matching to detect particular patterns on subtree and then do some action on it if "shape requirements" are satisfied.
  2. Recursive "tree scan" with transformation of input read-only object tree into output read-only object tree. Scanned tree is "pattern matched" (often with "subtree template" check) and appropriate action is chosen on subtree to output result transformed subtree to insert into result tree (and then passed children of root transformed subtree node again recursively pattern matched) or ignore this subtree completely if needed so.

When I have a look on Succinc<T> code it was interesting how my own implementation of this functional algorithms is very similar to Succinc<T> approach to pattern matching. So it would be good to extend Succinc<T> pattern matching for recursive object tree scan, matching and transformation.

Basic idea is the "tree scan context" which contain original subtree nodes collection from same tree level, result collection of nodes (to insert transformed tree result) and function to get children from subtree node from context.

Example code looks like

            result = html.DocumentNode.ChildNodes.TreeWalk(new ParseContext() { Nodes = result, BaseLink = baseLink })
                .GetChildren(node => node.ChildNodes) // assign default "get children func" to parse context
                .If(node => node.NodeType == typeof(IHtmlTextNode), (node, res) => AddToResult(res, new TextPostNode() { Text = WebUtility.HtmlDecode(node.InnerText) }), node => null)
                .If(IsPostLink, AddPostLink, node => null)
                .If(node => node.Name.EqualsNc("br"), (node, res) => AddToResult(res, new LineBreakPostNode()))
                .If(node => node.Name.EqualsNc("p"), (node, res) => CreateAttribute(res, PostBasicAttributes.Paragraph))
                .If(node => node.Name.EqualsNc("em"), (node, res) => CreateAttribute(res, PostBasicAttributes.Italic))
                .If(node => node.Name.EqualsNc("strong"), (node, res) => CreateAttribute(res, PostBasicAttributes.Bold))
                .If(node => GetPreformatNode(node) != null, (node, res) => CreateAttribute(res, PostBasicAttributes.Monospace), node => GetPreformatNode(node).ChildNodes)
                .If(node => CheckSpan(node, "u"), (node, res) => CreateAttribute(res, PostBasicAttributes.Underscore))
                .If(node => CheckSpan(node, "o"), (node, res) => CreateAttribute(res, PostBasicAttributes.Overscore))
                .If(node => CheckSpan(node, "spoiler"), (node, res) => CreateAttribute(res, PostBasicAttributes.Spoiler))
                .If(node => CheckSpan(node, "s"), (node, res) => CreateAttribute(res, PostBasicAttributes.Strikeout))
                .If(node => node.Name.EqualsNc("sub"), (node, res) => CreateAttribute(res, PostBasicAttributes.Sub))
                .If(node => node.Name.EqualsNc("sup"), (node, res) => CreateAttribute(res, PostBasicAttributes.Sup))
                .If(node => CheckSpan(node, "unkfunc"), (node, res) => CreateAttribute(res, PostBasicAttributes.Quote))
                .If(node => node.Name.EqualsNc("a") && !string.IsNullOrWhiteSpace(node.GetAttributeValue("href", null)), CreateLinkAttrNode)
                .Else((node, res) => res)
                .Run()
                .Nodes;

Particularly compact representation of tree matching and transformation, I think.

Tree scan context looks like this:

    /// <summary>
    /// Tree scan context.
    /// </summary>
    /// <typeparam name="T">Type of source tree node.</typeparam>
    /// <typeparam name="TApp">Type of result.</typeparam>
    public class TreeScanContext<T, TApp> : ITreeWalkContextBreak
    {
        /// <summary>
        /// Constructor.
        /// </summary>
        public TreeScanContext()
        {
            Functions = new List<TreeApplyFunc<T, TApp>>();
        }

        /// <summary>
        /// Source nodes collection.
        /// </summary>
        internal IEnumerable<T> Source { get; set; }

        /// <summary>
        /// Aggregated result for this source nodes collection.
        /// </summary>
        internal TApp Result { get; set; }

        /// <summary>
        /// Apply function.
        /// </summary>
        internal List<TreeApplyFunc<T, TApp>> Functions { get; private set; }

        /// <summary>
        /// Default apply function.
        /// </summary>
        internal Func<T, TApp, TApp> DefaultApply { get; set; }

        /// <summary>
        /// Default function to get children.
        /// </summary>
        internal Func<T, IEnumerable<T>> DefaultGetChildren { get; set; }

        /// <summary>
        /// True if tree scan should globally stop now.
        /// </summary>
        public bool IsBreak { get; set; }
    }

    /// <summary>
    /// Tree apply function description.
    /// </summary>
    /// <typeparam name="T">Type of source tree node.</typeparam>
    /// <typeparam name="TApp">Type of result.</typeparam>
    public class TreeApplyFunc<T, TApp>
    {
        /// <summary>
        /// Pattern match function.
        /// </summary>
        internal Func<T, bool> If { get; set; }

        /// <summary>
        /// Apply func. Aggregate result with passed information from original subtree node.
        /// </summary>
        internal Func<T, TApp, TApp> Apply { get; set; }

        /// <summary>
        /// Get children func. If it returns null or empty enumerable, subtree scan is stopped.
        /// </summary>
        internal Func<T, IEnumerable<T>> GetChildren { get; set; }

        /// <summary>
        /// True if "If" pattern matching function should be execute after
        /// all other matches and only if no other pattern apply.
        /// </summary>
        internal bool IsElse { get; set; }
    }

Code above is provided as the idea. I don't feel it's perfect. I'm just reinvented it to solve particular and quite common problem of tree scan and transformation using functional approach and recursive pattern matching. It would be good if Succinc<T> do support similar feature, but more systematically and consistently implemented.

DavidArno commented 7 years ago

I haven't documented it very well yet, but I think #36 might go at least some way to addressing this. Do you concur, or am I misreading this request?

Opiumtm commented 7 years ago

@DavidArno it isn't recursive pattern matching. Am I wrong?

Opiumtm commented 7 years ago

@DavidArno I may fork Succinc<T> and then move some of my "recursive pattern matching" logic to library via pull request. You can modify it to make it more pretty later. It's ok? That code already is successfully used in production in some of my projects. Just to mention, it's used in Html-to-Microsoft Word component in custom-made human resources management software used at my office.

DavidArno commented 7 years ago

@Opiumtm,

That sounds great. I can't promise which bits I'll include, but real code will help me understand how your proposal might fit in with what I'm already doing, whether yours is a better, or just a different suggestion etc. So "pull request" away 👍