ImJeremyHe / xmlserde

A user-friendly Rust library for serializing or deserializing the XML files
21 stars 2 forks source link

Support tags containing child, text or both #41

Closed Shohaii closed 6 months ago

Shohaii commented 6 months ago

Hi Jeremy, thanks for great crate.

I am trying to deserialize following xml:

<office:document-content>
    <office:body>
        <office:text text:use-soft-page-breaks="true">
            <text:p text:style-name="Normal">
                <text:span text:style-name="T2">Progress:<text:s/>
                </text:span>
                <text:span text:style-name="T3">
                    <office:annotation office:name="0" xml:id="1825723351">
                        <dc:creator>Name Surname</dc:creator>
                        <dc:date>2024-03-12T10:30:00</dc:date>
                        <meta:creator-initials>NS</meta:creator-initials>
                        <text:p text:style-name="CommentText">progress indicator</text:p>
                    </office:annotation>100%</text:span>
                <text:span text:style-name="CommentReference">
                    <office:annotation-end office:name="0"/>
                </text:span>
            </text:p>
            <text:p text:style-name="P4">Task completed!</text:p>
            <text:p text:style-name="P5"/>
        </office:text>
    </office:body>
</office:document-content>

Yes, I know, it has nothing in common with LogiSheets, but your crate works almost perfectly even for ODT file.

The problem is with preparation of rust structs and enums for deserialization of:

I tried to implement it like this (ignoring attributes):

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeText {
    #[xmlserde(name = b"text:p", ty = "child")]
    pub text_p: TextP,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextP {
    #[xmlserde(ty = "untag")]
    pub text_p_content: Option<TextPContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextPContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:span")]
    TextSpans(Vec<TextSpan>)
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextSpan {
    #[xmlserde(ty = "untag")]
    pub text_span_content: Vec<TextSpanContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextSpanContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:s", ty = "sfc")]
    TextS,
    #[xmlserde(name = b"office:annotation")]
    OfficeAnnotation(OfficeAnnotation),
    #[xmlserde(name = b"office:annotation-end", ty = "sfc")]
    OfficeAnnotationEnd(OfficeAnnotationEnd),
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeAnnotation {
    // whatever
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeAnnotationEnd {
    // whatever
}

Unfortunately, this is not compilable, but I wonder if there is a way, how to make it work with xmlserde crate. Sorry if I made some silly mistake in code, I am new to Rust.

There might be 3 options how to make it work:

  1. I could try to make custom implementation of XmlSerialize and XmlDeserialize traits for these situations
  2. You might consider adding support for these situations to xmlserde crate
  3. You might consider to make members of "Unparsed" struct public, so it would be possible to do manual parsing/deserializing for these tricky parts

I would like to know your opinion, thanks.

ImJeremyHe commented 6 months ago

@Shohaii Thank you for your report. If I understand correctly, you want a feature that an enum type that can have variants from child type or text type, right? like:

pub enum TestEnum {
      #[xmlserde(ty="text")]
       V1(string)
      #[xmlserde(ty="child")]
       V2(StructA)
}

I think this is a good idea. I would like to add a feature like that.

Shohaii commented 6 months ago

@ImJeremyHe Thank you for your quick reply. Yes, your suggestion looks great.

I hope that then, the implementation like this:

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextP {
    #[xmlserde(ty = "untag")]
    pub text_p_content: Vec<TextPContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextPContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:span", ty = "child")]
    TextSpan(TextSpan),
}

would solve all my issues.

  1. vector of children or text or both
  2. empty vector for <text:p text:style-name="P5"/> case

or maybe you can think of a better way. Anyway something like this would be great.

Thank you for your time.

ImJeremyHe commented 6 months ago

I will work on this later. Thanks for your confirmation

ImJeremyHe commented 6 months ago

@Shohaii Please have a look at this unit test.

Shohaii commented 6 months ago

Hi @ImJeremyHe, the feature works great. thank you. Long live the crate.