Tendrl / api

Tendrl API
GNU Lesser General Public License v2.1
16 stars 16 forks source link

List Clusters API call returns Internal Server Error after installation #306

Open fbalak opened 7 years ago

fbalak commented 7 years ago

I have cluster with 4 nodes. I have installed tendrl and after all was installed I had to wait about 2 minutes before I was able to use tendrl ui because List Clusters API call was returning 500 Internal Server Error. Most of other API calls were working correctly. (I was able to log in and navigate through sites)

After 2 minutes the site and API call started working correctly.

This is how it looks right after installation: 105_server_error

Tested with:

tendrl-commons-1.5.2-20171004T104238.6d5ffef.noarch
tendrl-api-1.5.2-20170927T205654.a9e16c0.noarch
tendrl-ui-1.5.2-20170928T151925.1cdc907.noarch
tendrl-node-agent-1.5.2-20171004T105957.709badc.noarch
tendrl-monitoring-integration-1.5.2-20171005T132717.1b8a4c1.noarch
tendrl-grafana-plugins-1.5.2-20171005T132717.1b8a4c1.noarch
tendrl-notifier-1.5.2-20170927T205657.566bfa0.noarch
tendrl-api-httpd-1.5.2-20170927T205654.a9e16c0.noarch
glusterfs-4.0dev-0.176.gitfabee9d.el7.centos.x86_64
anivargi commented 7 years ago

@fbalak It would be really helpful if you can provide the API logs when it was in error state.

fbalak commented 7 years ago

@anivargi Sure, here is /var/log/messages from server where the api is run: messages-server.txt If you want access to the machine, I can grant it to you if you send me ssh public key.

anivargi commented 7 years ago

@fbalak can you share /var/log/tendrl/api/error.log

fbalak commented 7 years ago

@anivargi here: error.log

anivargi commented 7 years ago

@r0h4n @shtripat looks like TendrlContext is temporarily empty even if the cluster_id is present, which causes the API to fail, I can add a check for that, but does this need to be handled by the backend?

r0h4n commented 7 years ago

Assuming you have not imported this cluster, tendrl-node-agent will start syncing all the data after the default "sync_interval" (as given in node-agent.conf.yaml). This is expected behavior and the UI and API should respond appropriately before the first sync completes

If the cluster is a Tendrl managed cluster, then we need to investigate further.

anivargi commented 7 years ago

@fbalak

fbalak commented 7 years ago

@anivargi @r0h4n I have been testing this and from the UI point of view it looks good now: 1012_cluster_error I am just concerned that the API call still returns 500 Internal Server Error and it provide no information for user about what is happening. If we know where the problem is in the backend can you please update error message and exit code? I would suggest 503 Service Unavailable.

fbalak commented 7 years ago

After discussion with @dahorak we came to conclusion that there should not be any 5xx error. It would be best to return not entirely imported cluster in response and add some loading API attribute that would be reflected in UI but It would also make sense to return code 202 Accepted and return empty list.

mbukatov commented 7 years ago

Based on discussion with @r0h4n today, I would propose the following.

Since Tendrl needs to wait for all node agents to send all data required to describe the cluster, the cluster can't be shown immediately. Only when Tendrl has all the data, it makes sense to return the cluster in the api/1.0/clusters api call.

Unfortunately there are few problems with the current behavior:

I don't think it makes sense to return not fully detected cluster in clusters list for multiple reasons:

So one possible approach would be to return empty list in the clusters call response until the cluster is fully detected. And then to let the user know about cluster detecting taking place, tendrl could send the notification about this.

mbukatov commented 7 years ago

I searched in the design documents and the ui design doesn't anticipate with any delay in detection of the cluster for import: https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244820318

@julienlim How would you suggest to address this delay in cluster detection? Would the notification as suggester be ok?

mbukatov commented 7 years ago

@fbalak @dahorak I don't think that returning 202 Accepted makes sense, as the cluster detection is not triggered by clusters api call.

mbukatov commented 7 years ago

@fbalak I disagree with your description in https://github.com/Tendrl/api/issues/306#issuecomment-336105767 - showing spinner during cluster detection looks much more reasonable compared current 500 status code which is ignored by the ui. Unfortunately it seems that the 500 status code was misused to indicate that the cluster detection is running, which is not ok.

dahorak commented 7 years ago

@mbukatov I agree with you, I suggested @fbalak 200 OK with empty clusters list (if it is not possible to mark cluster as not fully detected).

julienlim commented 7 years ago

Current takeaway on answers we need to the following questions:

@Tendrl/tendrl-qe @rghatvis@redhat.com @mcarrano @julienlim

mbukatov commented 7 years ago

Thinking about my previous suggestion to use notification: sending notification that tendrl is detecting a cluster makes sense from backend perspective, but could we use this notification to control when to show spinner in the ui?

r0h4n commented 7 years ago

@Tendrl/tendrl-core do you guys think we should add this right away or wait for the next release?

r0h4n commented 7 years ago

More updates on this issue discussed in today's daily meeting

julienlim commented 7 years ago

@Tendrl/tendrl-core @Tendrl/tendrl-qe @mcarrano @r0h4n

Here's our recommendation from a UI perspective:

(1) When there is only 1 cluster that is being detected (but not yet ready for importing), show the following blank slate:

Title: Cluster detected Summary: Tender server is connecting to the Tendrl node agents to detect the cluster resources.

(2) When there are multiple clusters and a detected cluster shows up in the Cluster list (but is not yet ready for Import), in the column before the button, show the following message “Cluster detection in progress” and disable Import button. When cluster detection has completed, then enable the “Import” button. Currently, the status of the cluster prior to being imported shows a “?” which is probably still okay to leave as-is.

If there's any other scenarios that I've not addressed, please advise and we can come up with other recommendations accordingly.

anivargi commented 7 years ago

Fixed in https://github.com/Tendrl/api/pull/332

shirshendu commented 6 years ago

@julienlim This issue is pretty old. Is it still persisting for you?