eu-nebulous / sal

Mozilla Public License 2.0
0 stars 0 forks source link

Failed to create a cluster with X64 edge node #9

Closed robert-sanfeliu closed 1 month ago

robert-sanfeliu commented 1 month ago

On nebulous-cd interacting with SAL via REST requests, I'm unable to create a cluster with an edge device with architecture X64. Here are the steps to reproduce:

  1. Register an AWS account

  2. Manually spawn a VM on AWS of type t2.large (uname -a command on the instance shows the architecture: Linux ip-172-31-19-81 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux)

  3. grant password based ssh access, set the password and test that it works.

  4. Register the edge node with the following request to SAL (POST to /sal/edge/register):

    {
    "name": "byod-test-202409051045",
    "jobId": "",
    "loginCredential": {
        "username": "ubuntu",
        "password": "112233",
        "privateKey": ""
    },
    "ipAddresses": [
        {
            "IpAddressType": "PUBLIC_IP",
            "IpVersion": "V4",
            "value": "100.27.188.175"
        },
        {
            "IpAddressType": "PRIVATE_IP",
            "IpVersion": "V4",
            "value": "158.37.65.101"
        }
    ],
    "nodeProperties": {
        "providerId": "1",
        "numberOfCores": 2,
        "memory": 80000,
        "disk": 1.0,
        "operatingSystem": {
            "operatingSystemFamily": "UBUNTU",
            "operatingSystemArchitecture": "AMD64",
            "operatingSystemVersion": 1804
        },
        "geoLocation": {
            "city": "Warsaw",
            "country": "Poland",
            "latitude": 52.237049,
            "longitude": 21.017532
        }
    },
    "port": "22",
    "reason": null,
    "diagnostic": null,
    "userId": null,
    "allocated": null, 
    "scriptURL": "https://www.google.com",
    "systemArch": "Intel64",
    "jarURL": "http://158.39.77.68:8880/rest/node.jar"
    }
  5. Get a node candidate for cluster master among the list of available node candidates in the AWS account registered. Select a m4.xlarge instance.

  6. Define a cluster with the following request (POST to /sal/cluster):

    {
    "name": "testedge202409051123",
    "master-node": "master-testedge202409051123",
    "nodes": [
    {
        "nodeName": "master-testedge202409051123",
        "nodeCandidateId": "8a74841291c15c0e0191c17be825002c",
        "cloudId": "c9a625c7-f705-4128-948f-6b5765509029"
    
    },    
    
    {
        "nodeName": "worker-testedge2024090511231",
        "nodeCandidateId": "8a74841291c15c0e0191c179d99b0000",        
        "cloudId": "edge"
    }
    
    ],
    "env-var": {
        "APPLICATION_ID": "apptestedge202409051123",
        "BROKER_ADDRESS": "158.37.63.86",
        "ACTIVEMQ_HOST": "158.37.63.86",
        "BROKER_PORT": "32754",
        "ACTIVEMQ_PORT": "32754",
        "ONM_IP": "158.39.201.249",
        "ONM_URL": "https://onm.cd.nebulouscloud.eu",
        "AMPL_LICENSE": "dontlookatthis"
    }
  7. Define cluster request returns 200 code and "true" as body.

  8. Call deploy cluster for the defined cluster (POST to /sal/cluster/{{cluster_name}}) , get the following response:

    <!doctype html><html lang="en"><head><title>HTTP Status 500 – Internal Server Error</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 500 – Internal Server Error</h1><hr class="line" /><p><b>Type</b> Exception Report</p><p><b>Message</b> Request processing failed; nested exception is java.lang.IllegalArgumentException: The Edge node system architecture Intel64 is not supported!</p><p><b>Description</b> The server encountered an unexpected condition that prevented it from fulfilling the request.</p><p><b>Exception</b></p><pre>org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.IllegalArgumentException: The Edge node system architecture Intel64 is not supported!
    org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:982)
    org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:696)
    org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:779)
    org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
    org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71)
    </pre><p><b>Root Cause</b></p><pre>java.lang.IllegalArgumentException: The Edge node system architecture Intel64 is not supported!
    org.ow2.proactive.sal.service.service.EdgeService.defineEdgeNodeSource(EdgeService.java:285)
    org.ow2.proactive.sal.service.service.EdgeService.lambda$addEdgeNodes$0(EdgeService.java:205)
    java.util.HashMap.forEach(HashMap.java:1290)
    org.ow2.proactive.sal.service.service.EdgeService.addEdgeNodes(EdgeService.java:156)
    org.ow2.proactive.sal.service.service.ClusterService.submitClutserNode(ClusterService.java:180)
    org.ow2.proactive.sal.service.service.ClusterService.deployCluster(ClusterService.java:159)
    org.ow2.proactive.sal.service.rest.ClusterRest.deployCluster(ClusterRest.java:72)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:204)
    org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132)
    org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
    org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:854)
    org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:765)
    org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
    org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
    org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
    org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:696)
    org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:779)
    org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
    org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71)
    </pre><p><b>Note</b> The full stack trace of the root cause is available in the server logs.</p><hr class="line" /><h3>Apache Tomcat/9.0.68</h3></body></html>
ankicabarisic commented 1 month ago

Tested the proof of concept scenario: ProActive jobs: 245,246,247 Failing in nebulous-cd while installing docker on the edge device Success in activeeon environment using scripts: https://github.com/ow2-proactive/scheduling-abstraction-layer/tree/master/docker/scripts

Further validations are to be reported.

robert-sanfeliu commented 1 month ago

tested the scenario with an AWS AMD instance (m7a.large) acting as an edge device registered using the following parameters on the /sal/edge/register call:

    "systemArch": "AMD",    
    "jarURL": "http://158.39.77.68:8880/rest/node.jar"

Job Ids: 258, 259, 260

Creation of the cluster is successful, including execution of the SAL scripts. Resulting cluster is operational and I can deploy a simple APP

image

image

ankicabarisic commented 1 month ago

hi Robert, great, thank you for this. Indeed the Intel64 architecture is referred to as ARM. I am improving the error message to notify regarding the supported architectures and will improve the documentation regarding this. Please note that will also need to change this node.jar to be more adapted for Edge. Also, scripts failed on the nebulous probably as the capacity of the AE dummy node is low.

ankicabarisic commented 1 month ago

@robert-sanfeliu more complete documentation regarding edge is provided here: https://github.com/ow2-proactive/scheduling-abstraction-layer/blob/master/endpoints/4-edge-endpoints.md

I created a smaller node.jar to be tested for ARM devices: https://www.activeeon.com/public_content/nebulous/node_14.1.0-SNAPSHOT_amd.jar

so can you please test the following that it works fine: "systemArch": "AMD", "jarURL": "https://www.activeeon.com/public_content/nebulous/node_14.1.0-SNAPSHOT_amd.jar"

Thank you!

robert-sanfeliu commented 1 month ago

tested the scenario with an AWS AMD instance (m7a.large) acting as an edge device registered using the following parameters on the /sal/edge/register call:

"systemArch": "AMD",
"jarURL": "https://www.activeeon.com/public_content/nebulous/node_14.1.0-SNAPSHOT_amd.jar"

Cluster creation was successful.