frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
191 stars 43 forks source link

Large bytes value causes validate error #251

Closed ashepherd closed 4 years ago

ashepherd commented 4 years ago

Overview

When calling package.valid or resource.valid on a package with a resource with the bytes field populated, we are receiving the error:

Descriptor validation error: '190524512' is not of type 'integer' at "resources/0/bytes" in descriptor and at "properties/resources/items/properties/bytes/type" in profile

We think this is valid, but maybe the profile JSONschema needs to be set to long? or should we write our own JSONSchema to handle large byte values?

package profile: https://github.com/frictionlessdata/datapackage-py/blob/master/datapackage/profiles/data-package.json#L482

resource profile: https://github.com/frictionlessdata/datapackage-py/blob/master/datapackage/profiles/data-resource.json#L262

our code:

package = Package(path)
if package.valid:
    # do something
else:
        for error in package.errors:
            logging.error(error)
        raise Exception('Invalid data package')

...

if resource.valid:
    # do something
else:
    for error in resource.errors:
        logging.error(error)
    raise Exception('Invalid data package resource')

datapackage.json:

{
  "@context": {
    "odo": "http://ocean-data.org/schema/"
  },
  "name": "dataset-615111-1453266000-adcp_transects",
  "id": "http://lod.bco-dmo.org/id/dataset/615111",
  "title": "ADCP Transects from 2014-2015 R/V C-HAWK MuLTI-2 project cruises in the Gulf of Maine, Coastal eastern Maine, from Frenchman Bay to the Canadian border",
  "homepage": "https://www.bco-dmo.org/dataset/615111",
  "licenses": [
    {
      "name": "CC-BY-4.0",
      "path": "https://creativecommons.org/licenses/by/4.0/",
      "title": "Creative Commons Attribution 4.0"
    }
  ],
  "sources": [
    {
      "title": "BCO-DMO",
      "path": "https://www.bco-dmo.org/dataset/615111"
    }
  ],
  "version": "2",
  "resources": [
    {
      "name": "dataset-615111_alltransectstargz__v2",
      "description": "A bundled tar.gz file containing all ADCP data collected during the MuLTI-2 Project.\r\n\r\nThese data are served as tar.gz native RDI ADCP ENX files.  The data are not raw, but partially processed.  ENX files are ADCP single-ping data (plus Navigation Data) after having been bin-mapped, transformed to Earth coordinates, and screened for error velocity, vertical velocity, and false targets. These data should be ready for averaging.  Users are responsible for processing the files after downloading them.  These files can be opened and viewed with VmDAS or WinADCP.",
      "profile": "data-resource",
      "odo:hasFileType": "odo:DataFile_FileType",
      "odo:hasDataFileType": {
        "tid": "573"
      },
      "odo:primaryDataFile": true,
      "odo:fromDataset": "http://lod.bco-dmo.org/id/dataset/615111",
      "path": "https://datadocs.bco-dmo.org/data/305/MuLTI_2/615111/2/data/all_transects.tar.gz",
      "mediatype": "application/gzip",
      "bytes": "190524512",
      "hash": "60cde20a698333847e5678cfff4da804"
    },
    {
      "name": "dataset-615111-1453266000-adcp_transects_dataset-description",
      "title": "Dataset Description",
      "path": "https://www.bco-dmo.org/dataset/615111/Dataset_description.pdf",
      "odo:hasFileType": "odo:SupplementalDocumentation_FileType",
      "odo:hasDataFileType": {
        "tid": 336
      },
      "mediatype": "application/pdf"
    },
    {
      "name": "dataset-615111-1453266000-adcp_transects_iso-19115-2-noaa",
      "title": "ISO 19115-2 (NOAA Profile)",
      "path": "https://www.bco-dmo.org/dataset/615111/DOI/NOAA_ISO19115-2.xml",
      "odo:hasFileType": "odo:Metadata_FileType",
      "odo:hasDataFileType": {
        "tid": 193
      },
      "mediatype": "application/xml"
    },
    {
      "name": "dataset-615111-1453266000-adcp_transects_whoas_dublin_core",
      "title": "Dublin Core XML",
      "path": "https://www.bco-dmo.org/dataset/615111/whoas/dublin_core.xml",
      "odo:hasFileType": "odo:Metadata_FileType",
      "odo:hasDataFileType": {
        "tid": 421
      },
      "mediatype": "application/xml"
    }
  ]
}

Please preserve this line to notify @roll (lead of this repository)

roll commented 4 years ago

@ashepherd Sorry, I've just got to this issue.

You need to specify bytes as an integer literally "bytes": 190524512

I'm closing, for now, please re-open if it doesn't help

ashepherd commented 4 years ago

ah, sorry @roll ! i shoulda caught that. thank you