JackWFinlay / jsonize

Convert HTML to JSON.
MIT License
23 stars 12 forks source link

Node property is null if "EmptyTextNodeHandling" is equls to "Ignore" #29

Closed feeeper closed 4 years ago

feeeper commented 8 years ago

Why if I set EmptyTextNodeHandling to EmptyTextNodeHandling.Ignore I don't have "node" property in the resulting JSON? Is it ok or not?

EmptyTextNodeHandling.Include example:

JsonizeConfiguration jsonizeConfiguration = new JsonizeConfiguration
{
    EmptyTextNodeHandling = EmptyTextNodeHandling.Include
};
string html = "<html><head></head><body><form></form><p></p></body></html>";
Jsonize jsonize = new Jsonize(html);
string result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);

/*
result:
{
    "node":"Document",
    "child":[
        {
            "node":"Element",
            "tag":"html",
            "child":[
                {
                    "node":"Element",
                    "tag":"head",
                    "text":""
                },
                {
                    "node":"Element",
                    "tag":"body",
                    "child":[
                        {
                            "node":"Element",
                            "tag":"form",
                            "text":""
                        },
                        {
                            "node":"Element",
                            "tag":"p",
                            "text":""
                        }
                    ]
                }
            ]
        }
    ]
}
*/

EmptyTextNodeHandling.Ignore example:

JsonizeConfiguration jsonizeConfiguration = new JsonizeConfiguration
{
    EmptyTextNodeHandling = EmptyTextNodeHandling.Ignore
};
string html = "<html><head></head><body><form></form><p></p></body></html>";
Jsonize jsonize = new Jsonize(html);
string result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);
/*
result:
{
  "node": "Document",
  "child": [
    {
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "tag": "body",
          "child": [
            {
              "tag": "form"
            },
            {
              "tag": "p"
            }
          ]
        }
      ]
    }
  ]
}
*/

As I can see JsonizeNode.Node property is sets only if innerText is not empty or if EmptyTextNodeHandling == EmptyTextNodeHandling.Include:

// Jsonize.GetChildren method
// ...
if (_emptyTextNodeHandling == EmptyTextNodeHandling.Include || !String.IsNullOrWhiteSpace(innerText))
{
    if (!htmlNode.HasChildNodes)
    {   
        childJsonizeNode.Text = innerText;
    }

    childJsonizeNode.Node = htmlNode.NodeType.ToString();
    addToParent = true;
}
// ...

Is it bug or feature?

JackWFinlay commented 8 years ago

It's a bug created during an attempt to build a feature... It should only exclude the node tag for empty text nodes. It should still show the node tag for the other nodes.

JackWFinlay commented 8 years ago

Well, realistically it should exclude the whole node for empty text nodes. There is a logic error in there that appears to include excluding the node tag of other nodes as part of the exclusion.

JackWFinlay commented 7 years ago

Just updated to fix this issue. Can you let me know if it acts as you'd expect?