RehanSaeed / Schema.NET

Schema.org objects turned into strongly typed C# POCO classes for use in .NET. All classes can be serialized into JSON/JSON-LD and XML, typically used to represent structured data in the head section of html page.
MIT License
642 stars 80 forks source link

As far as I can see not possible to mix types in Values{T1,....Tn} #5

Closed AndreSteenbergen closed 5 years ago

AndreSteenbergen commented 6 years ago

I am trying to make a parser from html microdata to Schema.Net. As far as I can see now, I can only use 1 type in Values with more then 1 generic argument.

e.g. I would like to be able to parse an image using url, but somewhere later in the document I would like to add ImageObject type, because the creator of the html page made a differnce in types of images.

RehanSaeed commented 6 years ago

That's an interesting problem. The only way I think this can be implemented is if we also added List<object> to the Values<T> type. We'd need to do some investigation on what this would mean. I'd be willing to accept a PR for this.

AndreSteenbergen commented 6 years ago

Let me first say: the tool to generate classes is awesome.

I do have a solution ready right now, but it is a breaking change. I removed the values {T1,..Tn} and replaced them with a wrapper. I'll be working on it again this evening. The Values object is still strong typed this way, with reverting back to an object array. I'll try and see if I can create a pull request this evening. I also added ItemList to Thing in the partial, as I found a lot of sites are not really correct about there microdata, trying to put Product entiteis directly under Webpage entities for eaxmple. It is something I added so the parser would still return those child objects.

I basically created the following construct for all multityped stuff:

    public interface IIdentifier : IWrapper { }
    public interface IIdentifier<T> : IIdentifier { new T Data { get; set; } }
    public class IdentifierPropertyValue : IIdentifier<PropertyValue>
    {
        object IWrapper.Data { get { return Data; } set { Data = (PropertyValue) value; } }
        public virtual PropertyValue Data { get; set; }
        public IdentifierPropertyValue () { }
        public IdentifierPropertyValue (PropertyValue data) { Data = data; }
        public static implicit operator IdentifierPropertyValue (PropertyValue data) { return new IdentifierPropertyValue (data); }
    }

    public class Identifierstring : IIdentifier<string>
    {
        object IWrapper.Data { get { return Data; } set { Data = (string) value; } }
        public virtual string Data { get; set; }
        public Identifierstring () { }
        public Identifierstring (string data) { Data = data; }
        public static implicit operator Identifierstring (string data) { return new Identifierstring (data); }
    }

    /// <summary>
    /// The identifier property represents any kind of identifier for any kind of &lt;a class="localLink" href="http://schema.org/Thing"&gt;Thing&lt;/a&gt;, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See &lt;a href="/docs/datamodel.html#identifierBg"&gt;background notes&lt;/a&gt; for more details.
    /// </summary>
    [DataMember(Name = "identifier", Order = 11)]
    [JsonConverter(typeof(ValuesConverter))]
    public Values<IIdentifier>? Identifier { get; set; } //PropertyValue, string, Uri

I needed to change the jsonconverters quite a lot, and I haven't fully tested it. For the use cases I am working on it does work.

The changes for using the library is the following. Instead of a simple list. You will need to specify the type yourself, for every item. Example

        var webpage = new WebPage
        {
            Name = "TestSite",
            ItemList = new List<Thing> {
                new PriceSpecification
                {
                    MaxPrice = new List<decimal> { 23.45m, 12.35m }
                },
                new Product
                {
                    Offers = new Offer
                    {
                        Price = new List<Offer.IPrice> {
                                        (Offer.Pricedecimal)23.45m,
                                        (Offer.Pricedecimal)12.35m
                                    },
                        PriceCurrency = "EUR"
                    },
                    Name = "test product",
                    Brand = new List<Product.IBrand> {
                        (Product.BrandBrand)new Brand { Name = "Volvo" },
                        (Product.BrandOrganization) new Organization { Name = "Geely" },
                    }
                }
            }
        };
RehanSaeed commented 6 years ago

I have a gut feeling that there is something clever we can do with the Values types. I think we should investigate that before we go down the path of an additional class. Maybe something like this:

Desired Usage

This is still strongly typed and we don't have List<object> anywhere which will mean lots more boxing and unboxing.

public class Foo
{
    public Values<string, int> Bar { get; set; }
}

var foo = new Foo();
foo.Bar = new Values<string, int>(new List<string>() { "1" }, new List<int>() { 2 });

Psuedocode Implementation

As far as I can see, this would just mean adding a new constructor to each Values<T1...TN> type but also changing the Value property on it such that this:

public object Value
{
    get
    {
        if (this.values1.HasValue)
        {
            return this.values1.Value;
        }
        else if (this.values2.HasValue)
        {
            return this.values2.Value;
        }

        return null;
    }
}

Changes into this:

public object Value
{
    get
    {
        if (this.values1.HasValue && this.values2.HasValue)
        {
            // return a List<object> of values1.Value items concatenated with values2.Value items.
        }
        if (this.values1.HasValue)
        {
            return this.values1.Value;
        }
        else if (this.values2.HasValue)
        {
            return this.values2.Value;
        }

        return null;
    }
}

I have not thought through the changes this would require to the serializer for read and write operations.

RehanSaeed commented 6 years ago

This is a great issue by the way, I can't believe I didn't think that somebody might want to do this. I wonder what happens when we try to deserialize JSON-LD with a mix of stuff in it? Have you tested that?

It hurts my brain coming back to this code. I can kind of get where you are going with your code but a sample would make it easier to understand.

AndreSteenbergen commented 6 years ago

As far as I have tested, I can go back and fourth between json+ld and c# classes.

Problem with making multiple arrays of values is you loose original order, which I don't want. That's the whole idea behind the Interface.

I actually added this my test case in the comments. with BrandBrand and BrandOrganization. It results in this json+ld.

{"@context":"http://schema.org","@type":"WebPage","name":"TestSite","itemlist":[{"@type":"PriceSpecification","maxPrice":[23.45,12.35]},{"@type":"Product","name":"test product","brand":[{"@type":"Brand","name":"Volvo"},{"@type":"Organization","name":"Geely"}],"offers":{"@type":"Offer","price":[23.45,12.35],"priceCurrency":"EUR"}}]}

AndreSteenbergen commented 6 years ago

Changing second price to pricestring end up in json+ld as following:

{"@context":"http://schema.org","@type":"WebPage","name":"TestSite","itemlist":[{"@type":"PriceSpecification","maxPrice":[23.45,12.35]},{"@type":"Product","name":"test product","brand":[{"@type":"Brand","name":"Volvo"},{"@type":"Organization","name":"Geely"}],"offers":{"@type":"Offer","price":[23.45,"EUR 1234,99"],"priceCurrency":"EUR"}}]}

AndreSteenbergen commented 6 years ago

My simple test case

         var jsonLd = webpage.ToString();

        //now convert back
        var thing = JsonConvert.DeserializeObject<Thing>(jsonLd);
        Assert.NotNull(thing);
        Assert.IsType<WebPage>(thing);

        WebPage testObject = (WebPage)thing;
        Assert.Equal(testObject.Name.Value.Value, webpage.Name.Value.Value);
        Assert.Equal(testObject.ItemList.Value.List.Count, webpage.ItemList.Value.List.Count);
        Assert.Equal(testObject.ItemList.Value.List[1].Name.Value.Value, webpage.ItemList.Value.List[1].Name.Value.Value);

        var jsonLd2 = testObject.ToString();
        Assert.Equal(jsonLd2, jsonLd);
RehanSaeed commented 6 years ago

I tried your JSON with the current version of Schema.NET and it doesn't handle this scenario very well. The object gets deserialized to a Thing with null properties.

I hadn't thought about the ordering at all. I'm starting to get the code in your initial comment. That will mean a lot of extra classes generated by the tool. Is it possible to use a generic wrapper class so we don't need lots of type specific wrapper classes? Something like:

namespace Schema.NET
{
    public interface IValue
    {
        object Value { get; set; }
    }

    public interface IValues<T1, T2> : IValue
    {
        T1 Value1 { get; set; }

        T2 Value2 { get; set; }
    }

    public class Values<T1, T2> : IValues<T1, T2>
    {
        public T1 Value1 { get; set; }

        public T2 Value2 { get; set; }

        object IValue.Value
        {
            get => default(T1).Equals(this.Value1) ? (object)this.Value2 : this.Value1;
            set
            {
                if (value is T1 value1)
                {
                    this.Value1 = value1;
                }
                else if (value is T2 value2)
                {
                    this.Value2 = value2;
                }
            }
        }

        public Values() { }

        public Values(T1 value) { this.Value1 = value; }

        public Values(T2 value) { this.Value2 = value; }

        public static implicit operator Values<T1, T2>(T1 value) { return new Values<T1, T2>(value); }

        public static implicit operator Values<T1, T2>(T2 value) { return new Values<T1, T2>(value); }
    }
}

So usage becomes:

public Values<Values<string, Address>> Address { get; set; }

This effectively reverses the relationship between Values<T> and the Values<T1,T2...TN> types. Does this work? I'm just thinking out loud here.

AndreSteenbergen commented 6 years ago

I think that might even work, I haven't tried that way. I am worried about the default(T1) part. What if I define a value to be int 0, it would try and return T2, but that's only minor thing. I think I would add a return function property if T1 is set it would be () => value1;

public class Values<T1, T2> : IValues<T1, T2>
{
    public T1 Value1 { get; private set; }
    public T2 Value2 { get; private set; }
    private Func<object> returnFunc = () => null;

    object IValue.Value
    {
        get => this.returnFunc();
        set
        {
            if (value is T1 value1)
            {
                this.returnFunc = () => value1;
                this.Value1 = value1;
            }
            else if (value is T2 value2)
            {
                this.returnFunc = () => value2;
                this.Value2 = value2;
            }
        }
    }

    public Values() { }

    public Values(T1 value) { this.Value = value; }

    public Values(T2 value) { this.Value = value; }

    public static implicit operator Values<T1, T2>(T1 value) { return new Values<T1, T2>(value); }

    public static implicit operator Values<T1, T2>(T2 value) { return new Values<T1, T2>(value); }
}
AndreSteenbergen commented 6 years ago

humm no real need for T1 or T2 as properties, just add an object, no need from strong typed result this way. just set an extra property value, and return that ...

AndreSteenbergen commented 6 years ago
public class Values<T1, T2> : IValues<T1, T2>
{
    private object value;

    object IValue.Value
    {
        get => value;
        set 
        {
            if (value is T1 value1)
            {
                this.value = value1;
            }
            else if (value is T2 value2)
            {
                this.value = value2;
            }
            throw new ArgumentException();
        }
    }

    public Values() { }

    public Values(T1 value) { this.Value = value; }

    public Values(T2 value) { this.Value = value; }

    public static implicit operator Values<T1, T2>(T1 value) { return new Values<T1, T2>(value); }

    public static implicit operator Values<T1, T2>(T2 value) { return new Values<T1, T2>(value); }
}
RehanSaeed commented 6 years ago

We never define int, we always use nullable types. I think that’s right anyway.

Also, we should keep it open so people can use property initializer syntax and/or use constructors (I know this cleaner in some senses). If we do this we should set Value1 to default(T1) when Value2 is set.

RehanSaeed commented 6 years ago

I think the ArgumentException is unnecessary as the code can never get to that point.

AndreSteenbergen commented 6 years ago

It can as it is a public property, changing it afterward ... in code, doesn't make a lot of sense to do, but it is possible.

AndreSteenbergen commented 6 years ago

Oh you are correct about int I changed that in my code, as all Values<> were already nullable, why would I make a Values<int?>? set to new Values<int?> = (int?) null I would set the whole property as null.

RehanSaeed commented 6 years ago

I suppose it's possible for Value1 and Value2 to be null, in which case we should return null and not throw. The null can be ignored in the serialization.

AndreSteenbergen commented 6 years ago

I agree, this is better ;) if (value != null) throw new ArgumentException()

AndreSteenbergen commented 6 years ago

I just created the pull request. You can't pull as is, as it is my source code from before this entire discussion. But I did promise I'd make a pull request. I also added a jsonconverter for Thing, so it will return strong typed things.

AndreSteenbergen commented 5 years ago

I am returning to this thread again, after running into some minor things parsing stuff. Mainly where string is used where it not is not supposed to be, but actually quite normal in webworld, not specifying itemtype. So . Is this author an organisation, or is it a person? Therefore I propose, everything is a Values, with an explicit conversion from string. Letting it be a struct means it is never null (value off course can be null)

namespace Schema.NET
{
    using System;
    using System.Collections.Generic;

    /// <summary>
    /// An implementation for IValue, containing either type T, or string.
    /// e.g. Brand="Volvo"; is Volvo a brand or an organization, leaving it in the middle
    /// but ExplicitlySet will be set to true.
    /// </summary>
    /// <typeparam name="T1">The first type the values can take.</typeparam>
    /// <seealso cref="IValue" />
    public struct Values<T1> : IValue
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="Values{T1}"/> struct.
        /// </summary>
        /// <param name="value">The value of type <typeparamref name="T1"/>.</param>
        public Values(T1 value)
        {
            this.Value = value;
            this.ExplicitlySet = false;
        }

        /// <summary>
        /// Initializes a new instance of the <see cref="Values{T1}"/> struct, using a string
        /// </summary>
        /// <param name="value">The value of type <typeparamref name="T1"/>.</param>
        public Values(string value)
        {
            this.Value = value;
            this.ExplicitlySet = true;
        }

        /// <summary>
        /// Get the value set as constructor
        /// </summary>
        public object Value { get; private set; }

        /// <summary>
        /// True, when the object is explicitly set as a string
        /// </summary>
        public bool ExplicitlySet { get; private set; }

        /// <summary>
        /// Performs an implicit conversion from <typeparamref name="T1"/> to <see cref="Values{T1}"/>.
        /// </summary>
        /// <param name="item">The single item value.</param>
        /// <returns>The result of the conversion.</returns>
        public static implicit operator Values<T1>(T1 item) => new Values<T1>(item);

        /// <summary>
        /// Performs an explicit conversion from string <see cref="Values{T1}"/>, also sets the <see cref="ExplicitlySet"/> to true.
        /// </summary>
        /// <param name="item">The single item value.</param>
        /// <returns>The result of the conversion.</returns>
        public static explicit operator Values<T1>(string item) => new Values<T1>(item);
    }
}
AndreSteenbergen commented 5 years ago

All OneOrMany are of Values<T1 .... Tn>, having explicit T1 .. Tn values inside the objects is not very helpful, you always need to check the type anyway dealing with Things in you own code.

RehanSaeed commented 5 years ago

I've implemented type mixng on the mixed-types branch. All those interested please give it a test. Still need to fix a few broken unit tests (The tests are now incorrect) and write a few more but it generally works.

One interesting thing I realized was that the ItemListElement property on an ItemList can have ListItem or IThing types. This means that when you deserialize an ItemList from JSON, you can access the ItemListElement property either as a ListItem or IThing. One funky side effect of this is that if you deserialize a ItemList and then serialize it again, you will get twice as many items in the ItemListElement property. This also occurs with some URL properties which can be represented as string or Uri. Since I don't think this is something anyone would do, I might be ok with that. Feedback welcome.

See the MixedTypesTest class for examples of how it all works. Also need to update the ReadMe. Any help would be much appreciated.

RehanSaeed commented 5 years ago

Mixed types are now supported. Closing issue.

adampound commented 2 years ago

Is there an example using the mixed types to deserialize the data. I couldn't find the mixed-types branch

Turnerj commented 2 years ago

Hi @adampound - it depends on what you mean. If you serialize or deserialize a schema object that has multiple types on a property, it will automatically handle it for you.

For just working with a Values type, you can follow what the various tests do here: https://github.com/RehanSaeed/Schema.NET/blob/main/Tests/Schema.NET.Test/Values4Test.cs

adampound commented 2 years ago

I ended up just creating my own model. I was trying to scrape the recipe tag from sites to allow user to auto import recipes but it would receive a type error when trying.