Closed tgporto closed 10 months ago
Ainda estou investigando esse erro, mas ao que tudo indica, está ocorrendo uma falha de comunicação entre o Control Plane e o Worker.
Testes executados:
Investigar:
Mesmo problema ocorrendo comigo, o problema de mudar a conta pra Pay-as-you-go, é que pode gerar cobrança !!
│ Error: Work Request error
│ Provider version: 5.24.0, released on 2024-01-10.
│ Service: Containerengine Node Pool
│ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937)
│ Timestamp: 2024-01-14T14:29:53.556Z
│ Client version: Oracle-JavaSDK/2.54.1
│ Request Endpoint: Unknown request endpoint
│ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error
│ Also see Unknown API reference link for details on this operation's requirements.
│ To get more info on the failing request, you can enable debug level logs as mentioned in Using SLF4J for Logging section
in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm.
│ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message.
│ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq
│ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool
│
│
│ with module.cluster.oci_containerengine_node_pool.k8s_node_pool,
│ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool":
│ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" {
tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste
Quem estiver utilizando o bônus da Oracle de $500 para rodar o cluster com as instâncias VM.Standard.E3.Flex
, pode realizar o teste atualizando a image_id
e kubernetes_version
no arquivo variables.tf
conforme dados abaixo:
image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka
kubernetes_version = 1.28.2
│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in
Using SLF4J for Logging section
in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" {tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste
Este erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.
│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in
Using SLF4J for Logging section
in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persisteEste erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.
vou aguardar entao
Upgrade Your account provisioning is in progress. You will be able to access the feature after the provisioning is complete.
│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in
Using SLF4J for Logging section
in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persisteEste erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.
vou aguardar entao
Upgrade Your account provisioning is in progress. You will be able to access the feature after the provisioning is complete.
Se estiver atualizando a conta para o Pay As You Go
, fique atento à Issue #8 , pois ao fim do trial (30 dias), você poderá ter cobranças se não atualizar o cluster para a modalidade não gerenciada (In-Progress).
Já que A1 esta Out of capacity, segui a instrução do Rapha e alterei para E3, dessa forma o cluster subiu.
"Esse erro é pq a região de Ashburn está sem instâncias VM.Standard.A1.Flex disponíveis. Você pode tentar utilizar as instâncias VM.Standard.E3.Flex se estiver com o bônus de $500 oferecido pela Oracle, pois essas instâncias normalmente são pagas."
"Mas se estiverem utilizando as instâncias VM.Standard.E3.Flex com o saldo bônus, podem tentar atualizar o arquivo variables.tf com essas informações:
image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2"
@tgporto @henri-31 @paulovieirajr
Se conseguirem validar que o problema foi resolvido ao atualizar o image_id
e a kubernetes_version
, ficarei grato. Assim, poderei fechar esta Issue e atualizar o README.
Quem estiver utilizando o bônus da Oracle de $500 para rodar o cluster com as instâncias
VM.Standard.E3.Flex
, pode realizar o teste atualizando aimage_id
ekubernetes_version
no arquivovariables.tf
conforme dados abaixo:image_id =
ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka
kubernetes_version =1.28.2
Aqui funcionou com :
VM.Standard.E3.Flex image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2
Obtive o mesmo erro do provedor não conseguir entregar o Node Pool dentro dos 20 min, mandei agora um terraform destroy com sucesso e vou tentar novamente mais tarde.
Obs.: Usei as mesmas variáveis do @fernandoguide para consumir os créditos. Vou tentar depois usar os arquivos terraform nas tela do Resource Manager dentro da OCI.
Obtive o mesmo erro do provedor não conseguir entregar o Node Pool dentro dos 20 min, mandei agora um terraform destroy com sucesso e vou tentar novamente mais tarde.
Obs.: Usei as mesmas variáveis do @fernandoguide para consumir os créditos. Vou tentar depois usar os arquivos terraform nas tela do Resource Manager dentro da OCI.
Realmente 2GB é o mínimo de memória para uma instância VM.Standard.E3.Flex se ingressar como Worker Node no OKE neste nosso cenário. Não achei referência na documentação, apenas testes.
Vou atualizar o README
do repositório com as seguintes informações:
shape: VM.Standard.E3.Flex ocpus_per_node: 1 memory_in_gbs_per_node: 2 image_id: ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka k8s_version: v1.28.2
Se o problema ocorrer novamente, tente recriar o cluster antes de reabrir essa Issue.
Agradeço a contribuição de todos.
Boa tarde, pessoal!
Durante o processo de criação do cluster recebi o seguinte erro após o git pull do repositório:
Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaatax676brvddgtfibc5jd22lp4wrsskdzy644ifytpwvgcnlrf4ia, entity: nodepool, action: CREATED. Message: 1 nodes(s) register timeout. First, confirm that network prerequisites have been met. If network prerequisites have been met, troubleshoot the problem by running the Node Doctor script on the node(s) experiencing the issue, using either SSH or the Run Command feature. If you cannot resolve the issue using the troubleshooting output from the Node Doctor script, open a Service Request with My Oracle Support and upload the support bundle (a .tar file) to the support ticket. For more information, see https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengnetworkconfig.htm and https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengtroubleshooting_topic-node_troubleshooting.htm │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaa5tfnzdu2iwbyaanh4zi2fjibvfw6u3reog66hcdv3nnf3rpzs3ba │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { │
Destruí a infra e tentei novamente, tendo o mesmo resultado. Poderiam ajudar?