Rapha-Borges / oke-free

Uma maneira fácil de garantir seu próprio cluster Kubernetes gratuitamente e para sempre
GNU General Public License v3.0
891 stars 381 forks source link

Error: Work Request error #5

Closed tgporto closed 10 months ago

tgporto commented 10 months ago

Boa tarde, pessoal!

Durante o processo de criação do cluster recebi o seguinte erro após o git pull do repositório:

Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaatax676brvddgtfibc5jd22lp4wrsskdzy644ifytpwvgcnlrf4ia, entity: nodepool, action: CREATED. Message: 1 nodes(s) register timeout. First, confirm that network prerequisites have been met. If network prerequisites have been met, troubleshoot the problem by running the Node Doctor script on the node(s) experiencing the issue, using either SSH or the Run Command feature. If you cannot resolve the issue using the troubleshooting output from the Node Doctor script, open a Service Request with My Oracle Support and upload the support bundle (a .tar file) to the support ticket. For more information, see https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengnetworkconfig.htm and https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengtroubleshooting_topic-node_troubleshooting.htm │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaa5tfnzdu2iwbyaanh4zi2fjibvfw6u3reog66hcdv3nnf3rpzs3ba │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { │

Destruí a infra e tentei novamente, tendo o mesmo resultado. Poderiam ajudar?

Rapha-Borges commented 10 months ago

Ainda estou investigando esse erro, mas ao que tudo indica, está ocorrendo uma falha de comunicação entre o Control Plane e o Worker.

Testes executados:

Investigar:

lzocateli commented 10 months ago

Mesmo problema ocorrendo comigo, o problema de mudar a conta pra Pay-as-you-go, é que pode gerar cobrança !!

fernandoguide commented 10 months ago

│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in Using SLF4J for Logging section in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" {

tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste

Rapha-Borges commented 10 months ago

Quem estiver utilizando o bônus da Oracle de $500 para rodar o cluster com as instâncias VM.Standard.E3.Flex, pode realizar o teste atualizando a image_id e kubernetes_version no arquivo variables.tf conforme dados abaixo:

image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2

Rapha-Borges commented 10 months ago

│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in Using SLF4J for Logging section in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" {

tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste

Este erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.

fernandoguide commented 10 months ago

│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in Using SLF4J for Logging section in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste

Este erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.

vou aguardar entao

Upgrade Your account provisioning is in progress. You will be able to access the feature after the provisioning is complete.

Rapha-Borges commented 10 months ago

│ Error: Work Request error │ Provider version: 5.24.0, released on 2024-01-10. │ Service: Containerengine Node Pool │ Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaayxe3iror77mmp4oghf3jnrep6bjvsjgab4dadtrdww55tstsvmyq, entity: nodepool, action: CREATED. Message: 1 node(s) launch failure. Reason for failure on one of the nodes : Error returned by Unknown operation operation in Unknown service service.(500, InternalError, false) Out of host capacity. (opc-request-id: F436783DEF0D4F14800A9AD54AEE4B67/252A04DCEEDE320C515850EF75DBF22B/2941D801BC4ED0ABCE806FAB1604D937) │ Timestamp: 2024-01-14T14:29:53.556Z │ Client version: Oracle-JavaSDK/2.54.1 │ Request Endpoint: Unknown request endpoint │ Troubleshooting Tips: See https://docs.oracle.com/en-us/iaas/Content/API/References/apierrors.htm#apierrors_500__500_internalerror for more information about resolving this error │ Also see Unknown API reference link for details on this operation's requirements. │ To get more info on the failing request, you can enable debug level logs as mentioned in Using SLF4J for Logging section in https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkconfig.htm. │ If you are unable to resolve this Unknown service issue, please contact Oracle support and provide them this full error message. │ Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaaufesxyeqzk6ftdpu7b4aycqk2lr2rdgbszpveu7i3n4vu523lpaq │ Suggestion: Please retry or contact support for help with service: Containerengine Node Pool │ │ │ with module.cluster.oci_containerengine_node_pool.k8s_node_pool, │ on cluster/k8s.tf line 29, in resource "oci_containerengine_node_pool" "k8s_node_pool": │ 29: resource "oci_containerengine_node_pool" "k8s_node_pool" { tambem estou com esse problema desde ontem , agora tentei com o repo atualizado e mesmo assim o erro persiste

Este erro não é o mesmo relatado nesta Issue. Para obter mais informações sobre o erro "Out of host capacity", consulte o README.

vou aguardar entao

Upgrade Your account provisioning is in progress. You will be able to access the feature after the provisioning is complete.

Se estiver atualizando a conta para o Pay As You Go, fique atento à Issue #8 , pois ao fim do trial (30 dias), você poderá ter cobranças se não atualizar o cluster para a modalidade não gerenciada (In-Progress).

weslley182 commented 10 months ago

Já que A1 esta Out of capacity, segui a instrução do Rapha e alterei para E3, dessa forma o cluster subiu.

"Esse erro é pq a região de Ashburn está sem instâncias VM.Standard.A1.Flex disponíveis. Você pode tentar utilizar as instâncias VM.Standard.E3.Flex se estiver com o bônus de $500 oferecido pela Oracle, pois essas instâncias normalmente são pagas."

"Mas se estiverem utilizando as instâncias VM.Standard.E3.Flex com o saldo bônus, podem tentar atualizar o arquivo variables.tf com essas informações:

image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2"

image

Rapha-Borges commented 10 months ago

@tgporto @henri-31 @paulovieirajr Se conseguirem validar que o problema foi resolvido ao atualizar o image_id e a kubernetes_version, ficarei grato. Assim, poderei fechar esta Issue e atualizar o README.

Quem estiver utilizando o bônus da Oracle de $500 para rodar o cluster com as instâncias VM.Standard.E3.Flex, pode realizar o teste atualizando a image_id e kubernetes_version no arquivo variables.tf conforme dados abaixo:

image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2

fernandoguide commented 10 months ago

Aqui funcionou com :

VM.Standard.E3.Flex image_id = ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka kubernetes_version = 1.28.2

teste-cluster

victorjuniorrb commented 10 months ago

Obtive o mesmo erro do provedor não conseguir entregar o Node Pool dentro dos 20 min, mandei agora um terraform destroy com sucesso e vou tentar novamente mais tarde.

Obs.: Usei as mesmas variáveis do @fernandoguide para consumir os créditos. Vou tentar depois usar os arquivos terraform nas tela do Resource Manager dentro da OCI.

victorjuniorrb commented 10 months ago

Obtive o mesmo erro do provedor não conseguir entregar o Node Pool dentro dos 20 min, mandei agora um terraform destroy com sucesso e vou tentar novamente mais tarde.

Obs.: Usei as mesmas variáveis do @fernandoguide para consumir os créditos. Vou tentar depois usar os arquivos terraform nas tela do Resource Manager dentro da OCI.

Realmente 2GB é o mínimo de memória para uma instância VM.Standard.E3.Flex se ingressar como Worker Node no OKE neste nosso cenário. Não achei referência na documentação, apenas testes.

Rapha-Borges commented 10 months ago

Vou atualizar o README do repositório com as seguintes informações:

shape: VM.Standard.E3.Flex ocpus_per_node: 1 memory_in_gbs_per_node: 2 image_id: ocid1.image.oc1.iad.aaaaaaaanwsto6tqklfuawgqrve5ugjpbff3l5qtb7bs35dp72ewcnsuwoka k8s_version: v1.28.2

Se o problema ocorrer novamente, tente recriar o cluster antes de reabrir essa Issue.

Agradeço a contribuição de todos.