Closed Priyankasaggu11929 closed 4 years ago
Hi @Priyankasaggu11929,
I am not a Dataproc expert, but let me try to help you.
Can you clarify, which components are installed? ( I am confused by your current description. I have a feeling you mean ANACONDA, HIVE_WEBHCAT, JUPYTER, ZEPPELIN, ZOOKEEPER
is installed, but NOT PRESTO
. Is this correct?
Taking a quick look at the REST APIs: dataproc-v1 dataproc-v1beta2
It looks like to me, the GA API does NOT support PRESTO
, only the BETA API.
Unfortunately there is not official GCP type released for dataproc-v1beta
, but you can use custom types easily.
Do you want me to help you to create a custom type for dataproc-v1beta
? ( Looking at the dataproc-v1
type, it should be fairly simple.)
@ocsig Thanks for the reply.
Yes, I meant, ANACONDA, HIVE_WEBHCAT, JUPYTER, ZEPPELIN, ZOOKEEPER
are getting installed but not PRESTO
.
I think I need your help here. I'm not sure how to create a custom type for dataproc-v1beta
. :)
Ok, I am working on it right now.
I found issue with your jinja:
/regions//{{ properties["zone"] }}
>> /regions/{{ properties["zone"] }}
Is this script working for you with the GA type ( except the PRESTO installation)?
Oh, @ocsig sorry for the typo there. That happened while I was pasting the template here and removing the actual values with configurable environment variables.
Yes, the script works properly apart from the PRESTO installation.
(NOTE: I am still rinning my test, I may modify this post if it fails.)
Creation of a custom Type documentation
You don't need to modify your template except the type you are using. I made changes to use defaults because it was easier for testing.
Step1:
Create an options file ( nano dataproc-v1beta.type.yaml
)
dataproc-v1beta.type.yaml:
options:
inputMappings:
- fieldName: Authorization
location: HEADER
value: $.concat("Bearer ", $.googleOauth2AccessToken())
methodMatch: .*
collectionOverrides:
- collection: projects.regions.clusters
options:
virtualProperties: |
schema: http://json-schema.org/draft-04/schema#
type: object
properties:
region:
type: string
required:
- region
inputMappings:
- methodMatch: ^(create|update|get|patch|delete)$
location: PATH
fieldName: region
value: >
$.resource.properties.region
- methodMatch: ^setIamPolicy$
location: PATH
fieldName: resource
value: >
$.resource.self.name
- methodMatch: ^(update|get|patch|delete)$
location: PATH
fieldName: clusterName
value: >
$.resource.properties.clusterName
Step 2: Create the custom type.
gcloud beta deployment-manager type-providers create dataproc-v1beta --api-options-file dataproc-v1beta.type.yaml --descriptor-url='https://dataproc.googleapis.com/$discovery
/rest?version=v1beta2'
Waiting for insert [operation-1584697838999-5a14637c5191b-9deb0532-b1336e26]...done.
Created type_provider [dataproc-v1beta]
From this point, your project has a custom type: my-project/dataproc-v1beta
.
You will use it just like a gcp-type
: my-project/dataproc-v1beta:projects.regions.clusters
dpbeta.jinja NOTE: Change 'my-project'
{% set clusterName = (env["deployment"] + "-dataproc-cluster") %}
resources:
- name: {{ clusterName }}
type: my-project/dataproc-v1beta:projects.regions.clusters
properties:
region: {{ properties["region"] }}
projectId: {{ env["project"] }}
clusterName: {{ clusterName }}
config:
# configBucket: example-bucket
gceClusterConfig:
zoneUri: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/zones/{{ properties["zone"] }}
tags:
- example-firewall-02
# subnetworkUri: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/regions/{{ properties["zone"] }}/subnetworks/example-subnet-1
# internalIpOnly: true
masterConfig:
numInstances: 1
machineTypeUri: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/zones/{{ properties["zone"] }}/machineTypes/n1-standard-2
diskConfig:
bootDiskSizeGb: 200
bootDiskType: pd-ssd
workerConfig:
numInstances: 2
machineTypeUri: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/zones/{{ properties["zone"] }}/machineTypes/n1-standard-2
diskConfig:
bootDiskSizeGb: 200
bootDiskType: pd-ssd
softwareConfig:
imageVersion: 1.4.23-ubuntu18
optionalComponents:
- ANACONDA
- HIVE_WEBHCAT
- JUPYTER
- PRESTO
- ZEPPELIN
- ZOOKEEPER
dpbeta.yaml
imports:
- path: dpbeta.jinja
resources:
- name: dpbeta
type: dpbeta.jinja
properties:
region: us-west1
zone: us-west1-b
Creating the deployment:
gcloud deployment-manager deployments create dptest --config=dpbeta.yaml
Thanks for the steps. I'm reproducing the steps and testing now.
@ocsig I followed the steps and I think it happened as well.
But now I'm stuck at this error
message: Required 'deploymentmanager.typeProviders.get' permission for '{{service_account_number}}@cloudservices.gserviceaccount.com
for resource projects/{{project_name}}/typeProviders/dataproc-v1beta'
I was reading this https://cloud.google.com/deployment-manager/docs/access-control but still couldn't find out how to provide permission for the same.
Is your type in the same project where you are launching the deployment from? ( so {{service_account_number}} == {{project_name}} ( The Project number and the project ID is identifying the same project, the two value is actually different?)
Make sure you either have the type in every project where you want to use it OR every DM Service account is a roles/deploymentmanager.viewer
in the project where the custom type lives so it can read it. ( Every SA means, every project has a different default DM SA. These are project Editors in that project, but has no IAM attached in other projects. It needs to be DM.viewer in the project where it loads the type from.)
I am using my organisation's gcp platform, so even after I'm (corresponding user account) granted with Editor role, the permissions are still not satisfied. I'll update here once it gets solved
Or in otherwise case too. Thanks a lot @ocsig for your time and help :)
@ocsig
[UPDATES]
I exhaustively added all the roles available for Deployment manager to the required service account.
I re-checked the project name as well to ensure if it is right. Also, I realised we have only project, so no chance of having the custom type-provider in another project. But I keep getting the same error down below.
Nothing seem to resolve the Permission error.
Can you confirm that the reducted projectID is correct?
projects/*******/typeProviders/dataproc-v1beta
is your type where *******
has to be your project ID. ( If that is not correct, that would explain everything.)
yes, project ID is correct. I re-checked it.
Also, when I'm trying to describe the custom type-provider through gcloud, the selflink comes as something like https://www.googleapis.com/deploymentmanager/v2beta/projects/******/global/typeProviders/dataproc-v1beta
.
This .../projects/{project-id}/global/typeProviders/...
is different from .../projects/{project-id}/typeProviders/...
?
It is pretty hard to debug your setup like this, so please forgive me for the super basic checks. What I am trying to verify if everything is happening in the same project or at one place your code was pointing to somewhere else.
Lets say, the project where you want to use this type is has the ID abc123
and the Number: 123123
.
Would you mind to go throught the following checklist and let me know if you find any deviation?
gcloud config get-value project
returns abc123
. (This means you are querying the project you want.)gcloud beta deployment-manager type-providers describe dataproc-v1beta | grep typeProviders/dataproc-v1beta
returns https://www.googleapis.com/deploymentmanager/v2beta/projects/abc123/global/typeProviders/dataproc-v1beta
( expecially the projects/abc123/global
part) (This means your type is in the right project.)more dpbeta.jinja | grep type
returns type: abc123/dataproc-v1beta:projects.regions.clusters
( This means your template is trying to access the custom type in the right project.)projects/abc123/typeProviders/dataproc-v1beta
123123@cloudervices.gserviceaccount.com
Please let me know which point(s) are failing in the checklist and what value do you see instead of your ProjectID/Number. ( Is it an other ID of yours you see?)
Ok, giving more clarification.
abc.com
(Ancestry: abc.com
) and with corresponding id 123456
.abc
(Ancestry: abc.com > abc
) and with corresponding id abc123
and project number 123123
.Now, when I run the above checklists, I get the following outputs.
gcloud config get-value project
returns abc123
. (This means you are querying the project you want.) gcloud beta deployment-manager type-providers describe dataproc-v1beta | grep typeProviders/dataproc-v1beta
returns https://www.googleapis.com/deploymentmanager/v2beta/projects/abc123/global/typeProviders/dataproc-v1beta
( expecially the projects/abc123/global
part) (This means your type is in the right project.)more dpbeta.jinja | grep type
returns type: abc123/dataproc-v1beta:projects.regions.clusters
( This means your template is trying to access the custom type in the right project.)Output: Here in place of getting abc123/dataproc-v1beta:projects.regions.clusters
, I get abc/dataproc-v1beta:projects.regions.clusters
i.e. not the project-id but the project-name.
projects/abc123/typeProviders/dataproc-v1beta
Output: Again, I get projects/abc/typeProviders/dataproc-v1beta
rather the one with project-id abc123
.
123123@cloudervices.gserviceaccount.com
.Note: If it doesn't makes proper sense even now, may I drop you an email with proper screenshots or actual information, whatsoever help you the best.
Thank you once again. :)
@ocsig I updated the above comment.
Great, I believe I have an understanding of the problem.
abc
>> This is a human readable name, this can be changed. This is not an identifier.abc123
>> This is a globally unique ID. This can be specified at creation, has to be unique among ALL GCP projects ( even outside your organization). This can not be changed later.gcloud projects describe abc123
will display you these information.)The issue is, that the type you would like to use is NOT abc/dataproc-v1beta:projects.regions.clusters
but abc123/dataproc-v1beta:projects.regions.clusters
.
Would you mind to update the dpbeta.jinja
so under the type, you using your ProjectID ( and not your Project Name)?
( And because you do not have listing permision to the project which has a ProjectID='abc', you are getting permission error.)
Let me know if this solves your problem.
@ocsig yes, I tried doing the above change sometime back only.
It gives me the following error:
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1584979015681-5a187af34c710-e90b525d-78a6b53c]: errors:
- code: RESOURCE_ERROR
location: /deployments/dev-dataproc-67/resources/dev-dataproc-67-dataproc-cluster
message: '{"ResourceType":"abc123/dataproc-v1beta:projects.regions.clusters","ResourceErrorCode":"401","ResourceErrorMessage":{"code":401,"message":"Request
had invalid authentication credentials. Expected OAuth 2 access token, login cookie
or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.","status":"UNAUTHENTICATED","statusMessage":"Unauthorized","requestPath":"https://dataproc.googleapis.com/v1beta2/projects/abc123/regions/us-west1/clusters","httpMethod":"POST"}}'
The good news is, this is a different error now. Now we know you are using the custom type, because this error actually comes from communicating with the Dataproc API.
Can you verify your type configuration contains the authentication part:
Running 'gcloud beta deployment-manager type-providers describe dataproc-v1beta' should contain this at the end:
[....]
options:
inputMappings:
- fieldName: Authorization
location: HEADER
methodMatch: .*
value: $.concat("Bearer ", $.googleOauth2AccessToken())
[.....]
If this is missing, your type did not picked up the config file dataproc-v1beta.type.yaml
(see my comment above.)
Yes, It comes in the output.
Ok, I think I found a problem. In place of value: $.concat("Bearer ", $.googleOauth2AccessToken())
, I have written value: $.concat("Bearer", $.googleOauth2AccessToken())
.
It is running currently. I will let you know if it worked successfully or not.
No errors so far.
Now, I feel super funny as all this was because of a singe space typo. :|
Wow, I knew that there has to be a typo, but couln't spot it.
From this point every property should be properly passed to the Dataproc API. Let me know.
( I almost dropped IT when I was 14 because of a missing ;
in my PHP book...)
@ocsig I can't thank you enough for being so patient with me.
It worked properly this time. I have PRESTO installation in the dataproc cluster.
Do you want me to delete the unneccesary comments above, so that someone else would find it easy to look for the solution.
I was happy to help, no need to delete the comments, the debugging steps are important as well. Maybe put a short TL:DR; on the top of your opening comment.
Thank you once again. :)
I added the link to the solution comment at the top.
[TL;DR] Here is the solution to the below problem: https://github.com/GoogleCloudPlatform/deploymentmanager-samples/issues/546#issuecomment-601622891
I tried creating a Presto dataproc cluster using the
optionalComponents
field under the software-config, but I observed only PRESTO is not getting installed. Rest all other components are successfully getting installed from the below template.Besides, I see Presto is not in beta as well, so what could be the possible solution here?