frictionlessdata / datapackage-java

A Java library for working with Frictionless Data Data Packages.
MIT License
20 stars 7 forks source link

Streamline `Package.java` API #29

Closed iSnow closed 3 years ago

iSnow commented 4 years ago

Overview

Besides the no-args constructor, the Package class has 8 constructors, half of them just convenience functions (See "AS IS" below).

The way the API is structured is also not very Java-like, as

Proposal 1: Trim and make idiomatic Java

It would be sufficient to trim the host of constructors to three, maybe four:

/**
 * Load from String representation of JSON object.
 */
public Package(String jsonStringSource, boolean strict)   

/**
 * Load from URL (either 'http'/'https' or file URL).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from File
 */
public Package(File sourceFile, boolean strict) 

/**
 * Load from InputStream.
 */
public Package(InputStream source, boolean strict) 

The most fundamental constructor would rely on an InputStream, the other constructors are syntactic sugar and would simply create an InputStream on either the URL or the JSON-String and delegate to this.

It would be necessary to validate that this setup supports ZIP-packaged DataPackages, but that should be possible. Going through the code, it seems that Resource resolution has a handful of problems in both directory-based datapackages with resources (paths can't really be relative to the datapackage.json) and ZIP-packaged DataPackages, but I would have to look into this more deeply.

Proposal 2: Switch to a Builder-pattern

Since fluid interfaces are very much a part of Java since a couple of years, it would maybe make sense to go with the practice and switch over to a builder-based API:

public Package {
    private Package();

    public static PackageBuilder builder();

    public static class PackageBuilder {

       public PackageBuilder fromSource(String jsonStringSource);

       public PackageBuilder fromSource(URL urlSource);

       public PackageBuilder fromSource(InputStream source);

       //maybe add
        public PackageBuilder fromZipSource(URL urlSource);

       public PackageBuilder setStrict (boolean strict);

       public Package build();
    }
}

The special method for reading from a ZIP file might be needed to read Resources from inside the ZIP-files, I am not totally sure.

If there's any interest in this, I would volunteer to work on either proposal to demo its validity.


AS IS:

 /**
     * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource, boolean strict) 

/**
 * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource) 

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource, boolean strict)         

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 * No validation by default.
 */
public Package(URL urlSource) 

/**
 * Load from local file system path.
 */
public Package(String filePath, String basePath, boolean strict) 

/**
 * Load from local file system path.
 * No validation by default.
 */
public Package(String filePath, String basePath) 

Please preserve this line to notify @georgeslabreche (maintainer of this repository)

roll commented 4 years ago

@iSnow It's really hard to say for me because I don't use Java but it seems a reasonable change. BTW as a client how can I use the lib comparing these 3 options?

iSnow commented 4 years ago

Hi, considering you aren't that familiar with Java, I won't directly write out code examples, but pseudo code. I am trying to be fair to the current API, eg. Case 4 is shorter in the current setup, but binds you to one certain JSON lib (Java, unfortunately, has many).

I'd like to point you to Case 5 & 6. Similar setup, one time the datapackage.json is directly referenced via URL (in this case, the library would need to store the base URL to load relative resources, but currently, it doesn't if I am not very wrong). In the other case, a ZIP file is referenced via URL. What you can see is that the current API needs to be used very differently between those cases.

It would still be good to get @georgeslabreche to give his opinion on the proposed change.

So, some scenarios how to use it:

Case 1: reading from a File (datapackage.json) in a directory

current:

proposal 1:

proposal 2:

Case 2: reading from a Zip somewhere in the file system

current:

proposal 1:

proposal 2:

Case 3: reading from in-memory blob representing a ZIP, eg. received in a server app via HTTP POST

current:

proposal 1:

proposal 2:

Case 4: reading from a JSON-object

current:

proposal 1:

proposal 2:

But what happens if the application doesn't use the json.org JSON library, but the GSON from Google? Then:

current:

proposal 1:

proposal 2:

Case 5: reading from a ZIP somewhere on a server

current:

proposal 1:

proposal 2:

Case 6: reading from a 'datapackage.json' somewhere on a server

current:

proposal 1:

proposal 2:

georgeslabreche commented 4 years ago

Hi guys!

I really like these proposals and agree that they are an improvement with respect to streamlining.

It's been a while so it's difficult for me to remember the exact details as to why it was implemented the way it is now. However, I do recall that special attention was made to stay true to the specifications as well as preserving consistent behaviour across implementations in other languages.

If we still want to enforce cross-language consistency then these proposals need to be compared with how the library has been implemented in other languages.

iSnow commented 4 years ago

Hi @georgeslabreche,

great to hear from you! I do recognize that you were following the interface draft very closely, which is a good thing. In multi-language projects though, you will always have a certain tension between what is proposed as a common interface and what is true to the nature of each language. It is a valid decision to stick to a common interface as closely as possible, but I'd argue that it's permissible to take some liberties to create idiomatic code for each environment, not least because the implementation guide states:

We prefer to focus on actions rather than features, feature sets, user stories, or more formal API specifications as we want to leave enough flexibility for implementations that follow the idioms of the host language, yet we do want to ensure a common base of what can be done with an implementation in any language.

There is no absolute right or wrong here, but my feeling when it comes to different language implementations, is that it's better to focus on what to achieve (the actions) than the how (precise API). Even more so because few developers constantly switch between Python, Java and JS.

georgeslabreche commented 4 years ago

I totally agree. We just need an A-OK or No-Go from the frictionlessdata peeps. @roll?

iSnow commented 3 years ago

Changes implemented close to proposal #1, so no builder pattern.