NaturalIntelligence / fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.
https://naturalintelligence.github.io/fast-xml-parser/
MIT License
2.43k stars 297 forks source link

XMLParser - `tagValueProcessor` called only on leaf nodes #657

Closed amenella closed 4 weeks ago

amenella commented 1 month ago

Description

Hi, I think I found a bug in how tagValueProcessor is executed in XMLParser class. It seems to be called only on leaf nodes (see sample code below).

I searched for similar issues but none seem related, however there is a similar (closed) one for XMLBuilder:

Code

// test.js
const { XMLParser } = require('fast-xml-parser');

function customTagValueProcessor(tagName, tagValue, propertyPath, hasAttributes, isLeafNode) {
  console.log('customTagValueProcessor() params:', {
    tagName, tagValue, propertyPath, hasAttributes, isLeafNode,
  })
  return tagValue;
}

function parseData(data) {

  const parser = new XMLParser({
    ignoreAttributes: false,
    parseAttributeValue: true,
    allowBooleanAttributes: true,
    attributeNamePrefix: '',
    attributesGroupName: '#attributes',
    textNodeName: '#value',
    parseTagValue: true,
    tagValueProcessor: customTagValueProcessor,
  });
  return parser.parse(data);
}

function main() {
  const data = `
    <?xml version="1.0" encoding="utf-8"?>
    <foo attr1="val1" attr2="val2">
      <bar attr3="val3" attr4="val4">
        <baz attr5="val5.1" attr6="val6.1">some text value</baz>
        <baz attr5="val5.2" attr6="val6.2">some other text value</baz>
      </bar>
    </foo>
  `;
  const parsedData = parseData(data);
  console.log(JSON.stringify(parsedData, undefined, 2));
}

main();

then run node test.js in a terminal

Output

You should see only 2 logs for the customTagValueProcessor function:

customTagValueProcessor() params: {
  tagName: 'baz',
  tagValue: 'some text value',
  propertyPath: 'foo.bar.baz',
  hasAttributes: true,
  isLeafNode: true
}
customTagValueProcessor() params: {
  tagName: 'baz',
  tagValue: 'some other text value',
  propertyPath: 'foo.bar.baz',
  hasAttributes: true,
  isLeafNode: true
}

And the parsed data (which is properly parsed in this case)

Expected data

As said previously, the data is properly parsed, however I would expect that the customTagValueProcessor function would be called on any node (and not only leaf nodes), and the output should be something like (probably not in this order):

customTagValueProcessor() params: {
  tagName: 'baz',
  tagValue: 'some text value',
  propertyPath: 'foo.bar.baz',
  hasAttributes: true,
  isLeafNode: true
}
customTagValueProcessor() params: {
  tagName: 'baz',
  tagValue: 'some other text value',
  propertyPath: 'foo.bar.baz',
  hasAttributes: true,
  isLeafNode: true
}
customTagValueProcessor() params: {
  tagName: 'bar',
  tagValue: undefined,
  propertyPath: 'foo.bar',
  hasAttributes: true,
  isLeafNode: false
}
customTagValueProcessor() params: {
  tagName: 'foo',
  tagValue: undefined,
  propertyPath: 'foo',
  hasAttributes: true,
  isLeafNode: false
}

Would you like to work on this issue?

Is it an expected behaviour of the tagValueProcessor attribute on XMLParser?

Thanks the project :pray:

github-actions[bot] commented 1 month ago

We're glad you find this project helpful. We'll try to address this issue ASAP. You can vist https://solothought.com to know recent features. Don't forget to star this repo.

amitguptagwl commented 1 month ago

As I remember, it is called only when a tag has (text) value. This is what mentioned in the documentation "if tag value is empty then tagValueProcessor will not be called."

You can probably try v5. it is experimental. But you can find features of your need. Please see documentation.

amitguptagwl commented 4 weeks ago

please reopen if you still any issue

amenella commented 4 weeks ago

Hi @amitguptagwl, sorry for the late reply, I had indeed not fully understood the precision in the doc:

if tag value is empty then tagValueProcessor will not be called.

So, this is effectively not a bug, you can close this issue.

I've finally managed to perform what I wanted by doing differently, however an option to control how tags should be parsed would be nice! (I did not try v5 for the moment)

Thanks again

amitguptagwl commented 4 weeks ago

If you're trying this library first time then better to have a look of v5. There you can customize the parsing very well.