datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
599 stars 181 forks source link

Additional metadata about remote components #1396

Closed jakubneubauer closed 8 years ago

jakubneubauer commented 8 years ago

As a REST API user, I want to receive more information about the available services (remote components) so that I use this in my own application.

Description

The REST service that returns the available services should be expanded with the following metadata/ info:

Systems that need to integrate with the cloud API need to implement a lot of logic to sort these service out. However, since the cloud already has this knowledge about its own services it seems more logical for them to provide it to the consumers of the API.

Business value

A more REST based API that could be used not only by HI specific applications but also by customer specific clients. We become more flexible in our applications.

jakubneubauer commented 8 years ago

I would introduce an annotation for transformers in this form:

@Scope{entityType={person,company}, country={NL} }

For DataCloud transformers, these would be the actual values:

Service EntityType Country
Address Correction person,company all
DE movers and deceased check person DE
Eircode person,company IE
Email Correction person,company all
NL Consumer Check person NL
NL deceased check person NL
Name Correction person,company all
Name Correction (Advanced) person,company all
Phone Correction person,company all
Sanction list check (companies) company all
Sanction list check (people) person all
jakubneubauer commented 8 years ago

@JoosjeBoon can you look at it?

JoosjeBoon commented 8 years ago

@jakubneubauer

Hey Jakub, What we are missing still is the distinction between cleansing and enrichment service. E.g.: NL Consumer Check is an enrich service (serviceType) and Name Correction is a cleansing service type. I will be gone until next Tuesday and @MennoB will be tracking this so if you have any more questions feel free to contact him. Thanks!

khouzvicka commented 8 years ago

I have new proposal: Update by Jakub: used "correction" instead of "cleansing" as the ServiceType.

Service ServiceType[] EntityType[] Countries[]
Address Correction correction person,company all
DE movers and deceased check enrich person DE
Eircode enrich person,company IE
Email Correction correction person,company all
NL Consumer Check enrich person NL
NL deceased check enrich person NL
Name Correction correction person,company all
Name Correction (Advanced) correction person,company all
Phone Correction correction person,company all
Sanction list check (companies) enrich company all
Sanction list check (people) enrich person all
MennoB commented 8 years ago

Looks good! (And I see the annotation has already made it to code.)

jakubneubauer commented 8 years ago

I would use word "correction" instead of "cleansing". Look on the transformer names - they also contain word "correction".

jakubneubauer commented 8 years ago

@LosD can you review, please? Your wisdom is appreciated, dear DataCleaner wizard.

jakubneubauer commented 8 years ago

One comment to using enums. It can cause backward incompatibility in Datacleaner as DataCloud client if we add new enum constant in future. Imagine: In future we add new ServiceType and set it to some transformer on DataCloud. Old DataCleaner Desktops then will be not able to deserialize this annotation at all.

LosD commented 8 years ago

Seems to make sense to me!

Enums are nice, but can certainly be a pain in the behind in these cases. We could simply have an UNKNOWN ServiceType/EntityType, and map unknown types to that. I think that Jackson supports custom mapping functions.

LosD commented 8 years ago

It doesn't even seem that we need the UNKNOWN:

public final class ServiceTypeDeserializer extends JsonDeserializer<ServiceType> {
    @Override
    public ServiceType deserialize(final JsonParser jsonParser, final DeserializationContext deserializationContext)
            throws IOException {
        final String jsonParserText = jsonParser.getText();
        for (final ServiceType value : ServiceType.values()) {
            if(value.name().equals(jsonParserText)){
                return value;
            }
        }

        return null;
    }
}

@JsonDeserialize(using = ServiceTypeDeserializer.class)
enum ServiceType {
    ENRICH, CORRECTION
}

Completely untested of course, but this should return the correct value if the client has it, otherwise null.

khouzvicka commented 8 years ago

@LosD Thanks. I will try it in code.

LosD commented 8 years ago

Yw :) I'm not sure if it actually gains us anything, but at least it's possible :)

khouzvicka commented 8 years ago

It is good idea with "UNKNOWN" value. But interface will be in DC Api and there isn't jackson dependency. Maybe do some changes on deserialization part. I will look on it.

LosD commented 8 years ago

You can use mix-ins on the Jackson ObjectMapper instead, then.

LosD commented 8 years ago

Here's a complete example using a mix-in: https://gist.github.com/LosD/107c9334d873f46ef98dfe893de8fc54

(I created it as a gist to keep the code level in here a little lower :))

Console output

{ "type": "ENRICH", "name": "Enrich service" } output: Name "Enrich service", type ENRICH
{ "type": "WHATSDAT?", "name": "New service" } output: Name "New service", type null

Mix-ins are pretty nice, exactly because you can leave the original classes alone.