frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
498 stars 114 forks source link

Hash should be an object? #379

Closed pwalsh closed 9 months ago

pwalsh commented 7 years ago

We currently support a special prefixing on the hash to declare hashing algorithms the are not MD5. However, we are moving away from this type of overloading in most all places in the spec.

So, I suggest:

"hash": {
  "type": "md5",
  "value": "{HASH}"
}
rufuspollock commented 7 years ago

@pwalsh agreed -- certainly for the "rigorous" version of the spec.

rufuspollock commented 7 years ago

@pwalsh do we want to do this in v1.0?

pwalsh commented 7 years ago

@roll @akariv do you want this for v1.0, or is v1.1 ok?

roll commented 7 years ago

@pwalsh It's breaking change. v1.0 vs v1.1 depends on breaking changes policy for the specs.

PS. But I suppose no implementations uses it for now so it's more conceptual than practical question.

rufuspollock commented 7 years ago

I have to say i also like the simplicity of a single value - no sub-object. Do we know what other systems do here e.g. debs etc?

fjuniorr commented 3 years ago

@rufuspollock and @roll did this ever make to (some version) of the specs?

I've noticed that if I read a data package with frictionless.py with a string value in the hash property it will convert it to an object and add a hashing property.

That is, this datapackage.json

{
  "profile": "data-package",
  "resources": [
    {
      "profile": "data-resource",
      "name": "estados",
      "path": "estados.csv",
      "hash": "sha256:c280dab2e21da93be52aef5a4c934abdd4d70d9981f59372e3f36f4ca8b1ac38"
    }
  ],
  "name": "datapackage-reprex"
}

After

from frictionless import Package

dp = Package('datapackage.json')

dp.to_json('datapackage.json')

is serialized as

{
  "profile": "data-package",
  "resources": [
    {
      "profile": "data-resource",
      "name": "estados",
      "path": "estados.csv",
      "hashing": "sha256",
      "stats": {
        "hash": "c280dab2e21da93be52aef5a4c934abdd4d70d9981f59372e3f36f4ca8b1ac38"
      }
    }
  ],
  "name": "datapackage-reprex"
}

I'm starting to use this in a production context and it would be nice to know the recommended approach moving forward.

rufuspollock commented 3 years ago

@fjuniorr i don't believe this ever made it into the spec so believe we are still with single string value with optional prefix.

rgaiacs commented 2 years ago

Hi @rufuspollock can you share what is the plan to release v1.1 or v2 that covers stats?

rjgladish commented 2 years ago

You might wish to consider multi-hash encoding https://github.com/multiformats/multihash/blob/master/README.md

It is used in IPFS for cryptographic content identifiers https://richardschneider.github.io/net-ipfs-core/articles/multihash.html

On Thu, Aug 11, 2022 at 8:00 PM Raniere Silva @.***> wrote:

Hi @rufuspollock https://github.com/rufuspollock can you share what is the plan to release v1.1 or v2 that covers stats?

— Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/379#issuecomment-1212603017, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFB5VN6VH2ONEOAKU6DZGLVYWHZ5ANCNFSM4DBUM6XA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Regards, Randy

rufuspollock commented 2 years ago

@rgaiacs stats stuff is #364 i think. Do you want to comment there (especially on what you'd like to see)

roll commented 10 months ago

Due to historical reasons, I propose to close this issue as wontfix.

For both publishers and implementors, there is really no difference whether this property is a string or an object, as it's very easy to use and implement any of them. I think there is no additional value to having changes here as for v2 and following, we strictly try to avoid breaking changes.

In general, what I think can really bring some additional value if we allow multiple hashes but it can be done just via a new non-breaking property like resource.hashes.[md5/sha256/etc] (although it really needs to be justified first)

roll commented 9 months ago

CLOSED as wontfix