frictionlessdata / datapackage-php

A php library for working with Data Package.
MIT License
10 stars 10 forks source link

DataPackage API feedback #18

Closed roll closed 7 years ago

roll commented 7 years ago

Overview

Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).

use frictionlessdata\datapackage;

// get a datapackage object
$datapackage = datapackage\Factory::datapackage("tests/fixtures/multi_data_datapackage.json");

// iterate over the data - it will raise exceptions in case of any problems
foreach ($datapackage as $resource) {
    print("-- ".$resource->name()." --");
    $i = 0;
    foreach ($resource as $dataStream) {
        print("-dataStream ".++$i);
        foreach ($dataStream as $line) {
            print($line);
        }
    }
}

// validate a datapackage descriptor
$validationErrors = datapackage\Factory::validate("tests/fixtures/simple_invalid_datapackage.json");
if (count($validationErrors) == 0) {
    print("descriptor is valid");
} else {
    print(datapackage\Validators\DatapackageValidationError::getErrorMessages($validationErrors));
}

// get and manipulate resources
$resources = $datapackage->resources();
$resources["resource-name"]->name() == "resource-name"
$resources["another-resource-name"] //  BaseResource based object (e.g. DefaultResource / TabularResource)

// get a single resource by name
$datapackage->resource("resource-name")

// delete a resource by name - will raise exception in case of validation failure for the new descriptor
$datapackage->deleteResource("resource-name");

// add a resource - will raise exception in case of validation error for the new descriptor
$resource = Factory::resource((object)[
    "name" => "new-resource", "data" => ["tests/fixtures/foo.txt", "tests/fixtures/baz.txt"]
])
$datapackage->addResource($resource);

// create a new datapackage from scratch
$datapackage = TabularDatapackage::create("my-tabular-datapackage", [
    TabularResource::create("my-tabular-resource")
]);

// set the tabular data schema
$datapackage->resource("my-tabular-resource")->descriptor()->schema = (object)[
    "fields" => [
        (object)["name" => "id", "type" => "integer"],
        (object)["name" => "data", "type" => "string"],
    ]
];

// add data files
$datapackage->resource("my-tabular-resource")->descriptor()->data[] = "/path/to/file-1.csv";
$datapackage->resource("my-tabular-resource")->descriptor()->data[] = "/path/to/file-2.csv";

// re-validate the new descriptor
$datapackage->revalidate();

// save the datapackage descriptor to a file
$datapackage->saveDescriptor("datapackage.json");

Factory class as an user interface?

actual for both DataPackage/Resource

Just want to raise that it could be confusing to non-tech users to have Factory as a main entry point instead of DataPackage and Resource as in other implementations. I think DataPackage.create/load and Resource.create/load could be more convenient (and predictable if user has an experience with other language implementations).

Accept PHP array?

actual for both DataPackage/Resource

The same as in https://github.com/frictionlessdata/tableschema-php/issues/26

roll commented 7 years ago

upd. There is a decision to use Package instead of DataPackage - https://github.com/frictionlessdata/datapackage-js/issues/81

OriHoch commented 7 years ago

fixed in v0.1.4 see the README for the updated usage examples