FINRAOS / herd

Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
http://finraos.github.io/herd/
Apache License 2.0
135 stars 41 forks source link

Herd Uploader #370

Open tinshuksingh opened 6 years ago

tinshuksingh commented 6 years ago

Hi Team,

We created business object definition and now trying to upload file to S3 bucket using herd-uploader-0.63.0.jar from ec2 instance.

Thanks, Tinshuk

nateiam commented 6 years ago

Hi Tinshuk -

The Uploader tool is included in our automated test suite in our environment so I believe it should not be too difficult to get working in your environment. And it's a good indication that the pre-registration worked.

I would like to collect some information. But first -- I bet you already discovered the swagger docs that ship with each release. Actually I think this is in the CloudFormation output but so maybe you did not see it. But the docs are at /herd-app/docs/rest/index.html and they will help you with the Storages GET below and many other REST calls you will be making in the future!

Please send:

I am also tagging @kenisteward here who can help troubleshoot. Thanks Tinshuk, Keni!

tinshuksingh commented 6 years ago

Hi @nateiam,

Please find details you asked,

kenisteward commented 6 years ago

@tinshuksingh

When the uploader tries the actual upload, it uses the BDATA"s storage.directorypath to go to the actual s3 place.

It looks like your storage doesn't have the attributes that tells where your s3 path is. If you could, try doing a stoarge put on the following attributes:

{
  "attributes": [
    {
      "name": "bucket.name",
      "value": "yourBucketName"
    }
  ]
}

If this doesn't work let us know. We think this should fix it with minimal changes but there are other knobs we can tweak.

tinshuksingh commented 6 years ago

@kenisteward

I updated the storage with attributes as:

    {
      "name": "S3StorageUnit",
      "storagePlatformName": "S3",
      "attributes": [
        {
          "name": "bucket.name",
          "value": "bucketName"
        }
      ]
    }

but getting same error as earlier I mentioned.

kenisteward commented 6 years ago

@tinshuksingh

Gotcha. Looks like you need to set the keyPrefix for the storage since you can't set the storage directory in the manifest.json. Maybe we can make that a feature of uploader? @nateiam

    {
      "name": "S3StorageUnit",
      "storagePlatformName": "S3",
      "attributes": [
        {
          "name": "bucket.name",
          "value": "bucketName"
        },
        {
          "name": "key.prefix.velocity.template",
          "value": "your/velocity/key/prefix"
        }
      ]
    }

It looks like with the herd-uploader's manifest.json, you aren't actually allowed to specify the storage directory. Because of this, you'll have to setup the directory path via the storage's key.prefix.velocity.template.

This can be any string. It also has replaceable values that are:

S3 Key Prefix Velocity Template $environment | The environment name. $namespace | The namespace code. $dataProviderName | The data provider name. $businessObjectDefinitionName | The name of the business object definition. $businessObjectFormatUsage | The business object format usage. $businessObjectFormatFileType | The business object format file type. $businessObjectFormatVersion | The version of the business object format. $businessObjectDataVersion | The version of the business object data. $businessObjectFormatPartitionKey | The partition key which must be pre-registered as part of the business object format. $businessObjectDataPartitionValue | The business object data primary partition value. $businessObjectDataPartitions | The ordered map of sub-partition column names to sub-partition values. $CollectionUtils | org.apache.commons.collections4.CollectionUtils.class

Examples:

$environment/$namespace/$businessObjectDataPartitionValue

$namespace/some/random/choices/$businessObjectFormatFileType/$businessObjectDataPartitionValue
kenisteward commented 6 years ago

@tinshuksingh Are you still having any issues?