FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
574 stars 222 forks source link

List tags not changing properly #617

Closed ThomHehl closed 1 year ago

ThomHehl commented 1 year ago

I have a list in an xml document that contains the following lines:

        <verse osisID="Obad.1.1">
          <w lemma="2377" n="1.0" morph="HNcmsc" id="31xeN">חֲז֖וֹן</w>
          <w lemma="5662" n="1" morph="HNp" id="31Nvk">עֹֽבַדְיָ֑ה</w>
          <w lemma="3541" morph="HD" id="31TyA">כֹּֽה</w><seg type="x-maqqef">־</seg><w lemma="559" morph="HVqp3ms" id="31vuQ">אָמַר֩</w>
          <w lemma="136" morph="HNcmpc/Sp1cs" id="31TSc">אֲדֹנָ֨/י</w>
          <w lemma="3069" n="0.0.2.0" morph="HNp" id="31k5P">יְהוִ֜ה</w>
          <w lemma="l/123" n="0.0.2" morph="HR/Np" id="31nPh">לֶ/אֱד֗וֹם</w>
          <w lemma="8052" morph="HNcfsa" id="31LN3">שְׁמוּעָ֨ה</w>
          <w lemma="8085" n="0.0.1.0" morph="HVqp1cp" id="31Qzf">שָׁמַ֜עְנוּ</w>
          <w lemma="m/854" morph="HR/R" id="31aPd">מֵ/אֵ֤ת</w>
          <w lemma="3068" n="0.0.1" morph="HNp" id="31eYX">יְהוָה֙</w>
          <w lemma="c/6735 a" n="0.0.0" morph="HC/Ncmsa" id="31C5U">וְ/צִיר֙</w>
          <w lemma="b/1471 a" morph="HRd/Ncmpa" id="31qNN">בַּ/גּוֹיִ֣ם</w>
          <w lemma="7971" n="0.0" morph="HVPp3ms" id="31EVS">שֻׁלָּ֔ח</w>
          <w lemma="6965 b" morph="HVqv2mp" id="31PB6">ק֛וּמוּ</w>
          <w lemma="c/6965 b" morph="HC/Vqh1cp" id="3137c">וְ/נָק֥וּמָה</w>
          <w lemma="5921 a" morph="HR/Sp3fs" id="31x9c">עָלֶי/הָ</w>
          <note n="Q">Marks a place where we agree with BHQ against BHS in reading L. </note>
          <note n="c">We read one or more accents in L differently than BHS. Often this notation indicates a typographical error in BHS. </note>
          <note n="n">Marks an anomalous form. </note>
          <w lemma="l/4421" n="0" morph="HRd/Ncfsa" id="31yNB">לַ/מִּלְחָמָֽה</w><seg type="x-sof-pasuq">׃</seg>
        </verse>

I deserialize this with the following code:

public class OsisVerse extends DocumentVerse {
    @JacksonXmlProperty(localName = "w")
    private List<OsisWord> osisWords;

And use a simple XmlMapper and this works great.

I massage the document and then output it and it looks like this:

          <verse>
            <words>
              <words id="31yNB" lemma="l/4421" morph="HRd/Ncfsa" n="0" x-source-word="לַ/מִּלְחָמָֽה">(...)</words>
            </words>
          </verse>

I've done some digging and this happens with 2.14-2.16.

The code can be pulled from here: https://github.com/ThomHehl/xlate and run Spock test SourceTextConverterTest or simply gradlew test and the test will fail.

Thanks!

cowtowncoder commented 1 year ago

As usual, Jackson XML module conforms to "eat what you cook", i.e. should be able to deserialize what it serializes. No guarantees for other kinds of XML.

In that sense, most if not all tests for deserialization should be based on serialization Jackson produces.

As per my earlier note, I cannot run your test: ./gradlew test gives

Error: Could not find or load main class org.gradle.wrapper.GradleWrapperMain

Above description is not very useful either, not actual code to write and read content, but one quick thing I note is that

@JacksonXmlProperty(localName = "w")

is used, but XML for some reason <words>, instead, is used for List property. So that alone would lead not matching.

I also hope that test is not disabling DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES since that generally hides real problems.

ThomHehl commented 1 year ago

I was under the impression that ./gradlew would download everything it needed to run. It, of course, wroks fine for me.

Yes, the issue I'm having is that it is creating a collection and then rolling all of th tags into a single line, which is problematic.

Are you running windows, by any chance? If so, you should run gradlew.bat. Are you using IntelliJ, because you should be able to run in the environment.

If you pulled xlate, you have the code. Look at file AtisVerse and it's all in there.

Let me know if none of this is helpful and I will see what I can do.

cowtowncoder commented 1 year ago

No, this is on MacOS; running from shell. And yes, my understanding, too, is that gradlew should usually download whatever it needs.

But maybe you can try if you can clone a clean copy of the repo on your machine and see that it works? It is possible your local copy might have something not checked in git repo? (has happened to me sometimes at least)

As per this: https://stackoverflow.com/questions/29805622/could-not-find-or-load-main-class-org-gradle-wrapper-gradlewrappermain

it might be that gradle wrapper generated won't work.

But since I have gradle installed, I can use that instead. With JDK 17 (seems to be needed) I can get

SourceTextConverterTest > Convert FAILED
    org.spockframework.runtime.ConditionNotSatisfiedError at SourceTextConverterTest.groovy:54

10 tests completed, 1 failed
cowtowncoder commented 1 year ago

Ok, I don't have time to dig into details of a project-specific test here. For me to help further, it'd need to be Jackson-deps-only reproduction.

I hope you will find the problem, whatever it is; and maybe someone else can help dig deeper.

ThomHehl commented 1 year ago

In BeanSerializerBase, this constructor:


    protected BeanSerializerBase(JavaType type, BeanSerializerBuilder builder,
            BeanPropertyWriter[] properties, BeanPropertyWriter[] filteredProperties)

Is being called. The properties include both words and w, which appears be what's causing the bug.

ThomHehl commented 1 year ago

OK, found i!. Perhaps not a bug after all. The problem is the methods inheriting from an abstract base class that return the same collection. I have attached the unit test that recreates this in the 2.17 branch.

First? Is it a bug or simply me not doing it right.

Second? How do I fix it?

TestSerializationList.java.txt

I had to change the extension for it to upload. Change back to java, please.

cowtowncoder commented 1 year ago

Ok, I'll have a look later tonight. Thank you for doing the detective work, @ThomHehl !

ThomHehl commented 1 year ago

It was my issue to resolve. :) Thanks for your assistance.

cowtowncoder commented 1 year ago

@ThomHehl I hesitate to call it your error, but yes, duplication is due to different implied names between:

        @JacksonXmlElementWrapper(useWrapping = false)
        @JacksonXmlProperty(isAttribute = false, localName = "w")
        public List<Word> wordList;

        @Override
        public List<Word> getWords() {
            return getWordList();
        }

so, renamed "wordList" as "w", and "words" from getWords().

It is possible to unify the names (by @JsonProperty or so); ignore one, keep other (@JsonIgnore) or, if the field is made non-public, just annotate getter -- non-public Fields are not detected by default.

I hope this helps!

ThomHehl commented 1 year ago

I used @JsonProperty to unify them as you suggested and it is definitely behaving much better. Still having an issue though. Here is the input xml.

        <verse osisID="Obad.1.1">
          <w lemma="2377" n="1.0" morph="HNcmsc" id="31xeN">חֲז֖וֹן</w>
          <w lemma="5662" n="1" morph="HNp" id="31Nvk">עֹֽבַדְיָ֑ה</w>
          <w lemma="3541" morph="HD" id="31TyA">כֹּֽה</w><seg type="x-maqqef">־</seg><w lemma="559" morph="HVqp3ms" id="31vuQ">אָמַר֩</w>
          <w lemma="136" morph="HNcmpc/Sp1cs" id="31TSc">אֲדֹנָ֨/י</w>
          <w lemma="3069" n="0.0.2.0" morph="HNp" id="31k5P">יְהוִ֜ה</w>
          <w lemma="l/123" n="0.0.2" morph="HR/Np" id="31nPh">לֶ/אֱד֗וֹם</w>
          <w lemma="8052" morph="HNcfsa" id="31LN3">שְׁמוּעָ֨ה</w>
          <w lemma="8085" n="0.0.1.0" morph="HVqp1cp" id="31Qzf">שָׁמַ֜עְנוּ</w>
          <w lemma="m/854" morph="HR/R" id="31aPd">מֵ/אֵ֤ת</w>
          <w lemma="3068" n="0.0.1" morph="HNp" id="31eYX">יְהוָה֙</w>
          <w lemma="c/6735 a" n="0.0.0" morph="HC/Ncmsa" id="31C5U">וְ/צִיר֙</w>
          <w lemma="b/1471 a" morph="HRd/Ncmpa" id="31qNN">בַּ/גּוֹיִ֣ם</w>
          <w lemma="7971" n="0.0" morph="HVPp3ms" id="31EVS">שֻׁלָּ֔ח</w>
          <w lemma="6965 b" morph="HVqv2mp" id="31PB6">ק֛וּמוּ</w>
          <w lemma="c/6965 b" morph="HC/Vqh1cp" id="3137c">וְ/נָק֥וּמָה</w>
          <w lemma="5921 a" morph="HR/Sp3fs" id="31x9c">עָלֶי/הָ</w>
          <note n="Q">Marks a place where we agree with BHQ against BHS in reading L. </note>
          <note n="c">We read one or more accents in L differently than BHS. Often this notation indicates a typographical error in BHS. </note>
          <note n="n">Marks an anomalous form. </note>
          <w lemma="l/4421" n="0" morph="HRd/Ncfsa" id="31yNB">לַ/מִּלְחָמָֽה</w><seg type="x-sof-pasuq">׃</seg>
        </verse>

When I read it in, I only get the last w tag. It ignores the first 16. I'm guessing it has to do with the fact that all of the w tags are interrupted by three note tags before the last w tag which is the one kept.

cowtowncoder commented 1 year ago

@ThomHehl yes. Only consecutive values are retained, so there are 2 separate List assignments. Work-around here would be to have explicit "setter" method for List which only appends entries, does not replace. And initialize with empty List (or null and make "setter" handle that case).

ThomHehl commented 1 year ago

OK, that seems to have done it. Thanks for your help.

cowtowncoder commented 12 months ago

All's well that ends well. I wish there was a better solution to non-contiguous list members tho; maybe in future it could be solved better way.