Open andrius-k opened 3 years ago
@rvenditti If you could get the mapping mentioned by @amaltaro (also pasted below) then no additional discussion is necessary. We can modify the Pull Request accordingly to take this into account and avoid creating a new workflow/spec parameter.
{"https://cmsweb.cern.ch/dqm/offline": "https://cmsweb.cern.ch/dqm/offline-new/api/v1/register",
"https://cmsweb.cern.ch/dqm/relval": "https://cmsweb.cern.ch/dqm/relval-new/api/v1/register",
"https://cmsweb.cern.ch/dqm/dev": "https://cmsweb.cern.ch/dqm/dev-new/api/v1/register"}
@rvenditti @jfernan2 Looking at the JIRA ticket, we are close to getting the permanent EOS storage. Are there any news regarding the cmsweb mapping?
Hi, just to summarize the situation (for future reference): We have already created following urls in the cmsweb production clusters in the past to use the NEW DQM GUIs: https://cmsweb.cern.ch/dqm/offline-new/ (offline) https://cmsweb.cern.ch/dqm/relval-new/ (relval) These links actually do not point to any machine.
If the "new" part of this mapping is just a placeholder that will not be used at all for the time being (until we are ready), we can assume the “new” urls as the ones above:
So in summary, the mapping is as you proposed:
{"https://cmsweb.cern.ch/dqm/offline": "https://cmsweb.cern.ch/dqm/offline-new/api/v1/register", "https://cmsweb.cern.ch/dqm/relval": "https://cmsweb.cern.ch/dqm/relval-new/api/v1/register", "https://cmsweb.cern.ch/dqm/dev": "https://cmsweb.cern.ch/dqm/dev-new/api/v1/register"}
@rvenditti Awesome! Thank you.
@rvenditti I tested the new mapping, but I got an error while trying to register the files.
This link in particular: https://cmsweb.cern.ch/dqm/offline-new/api/v1/register
shows expired SSL certs
Secure Connection Failed
An error occurred during a connection to cmsweb.cern.ch. SSL peer rejected your certificate as expired.
Error code: SSL_ERROR_EXPIRED_CERT_ALERT
Could that be the issue? I thought it could be my certs, but if I use the browser for example, I can see: https://cmsweb-testbed.cern.ch/dqm/offline-test-new/ but not: https://cmsweb.cern.ch/dqm/offline-new/
@amaltaro: FYI
Hi @khurtado The problem with the mapping you're trying to do is that there is no host on this link, it's just an empty link for when the new GUI moves to production.
Could that be the issue? I thought it could be my certs, but if I use the browser for example, I can see: https://cmsweb-testbed.cern.ch/dqm/offline-test-new/ but not: https://cmsweb.cern.ch/dqm/offline-new/
Right now the new GUI is being migrated from cmsweb testbed VM to cmsweb testbed Kubernetes cluster, but deployment is very recent and some bugs are still present. After the deployment in the cmsweb testbed Kubernetes cluster is a success and stable, deployment in cmsweb Kubernetes production cluster will start. Then "https://cmsweb.cern.ch/dqm/offline-new/" will be available, but for now only "https://cmsweb-testbed.cern.ch/dqm/offline-test-new/" is available.
Hi everyone, I am trying to understand where we stand with these developments and if I understand the messages above correctly, we do not have any service running on some of those new urls. Is that correct? Is there an ETA to have such services up & running?
I am afraid we cannot proceed with these developments until we have all the dependency machinery in place. Otherwise WMAgents will try to reach to those backends and will fail the whole job, failing both old and new DQM mechanism. It could take from a few days to a few weeks to have it deployed in production, but to be on the safe side, we cannot merge it unless it's been fully tested and there are DQM services listening on the new urls.
Please let us know if there is anything missing here; and/or if there is anything that we can help you with to move this forward. Thanks
@micsucmed @rvenditti @jfernan2 : Just pinging about this on what Alan asked last week. Are there any news or time estimates on when the new url mappings with services will be fully available/operational? We can't move forward with this until then.
Hi @amaltaro @khurtado, having a ETA for the new GUI to be running in the new urls is difficult as at the moment the deployment in Kubernetes testbed cluster is still ongoing and we are waiting on Cloud Infrastructure group to solve a problem with EOS mounting for the application within the cluster (Ticket: https://cern.service-now.com/service-portal?id=ticket&table=u_request_fulfillment&n=RQF2037818). Nonetheless, I'll keep you updated about the process and when the problem has been solved, so a more accurate ETA for the services to be running on the new urls can be given.
@micsucmed Understood. Thank you for the update on this!
Hi @micsucmed , we had a Workflow Management meeting today and we were wondering if there was any progress on this, e.g.: on the cloud infrastructure + EOS issue (CERN ticket just shows as empty to me)
Hi @khurtado, we haven't got any response from the assignee to the EOS ticket in a while (it might appear empty as it is a private ticket but I can add you to the watchers list if you like), so it's very difficult to say when this problem will be solved. It may be best if we continue with this using the testbed endpoint for the offline GUI ( https://cmsweb-testbed.cern.ch/dqm/offline-test-new/ ) and once the EOS problem is solved we change to the production endpoints.
Hi @micsucmed. just pinging to see if there is any change in the status of this overall.
@micsucmed @jfernan2 Since I can't read this ticket https://cern.service-now.com/service-portal?id=ticket&table=u_request_fulfillment&n=RQF2037818
Could you please let me know who is in charge of solving this ticket on the CERN side (if it hasn't been solved already)? Are there any other major issues besides the EOS issue preventinig this to move forward?
AFAWK, the main problem is solved and we are now finalizing some access issues, but @micsucmed can correct me and add a timescale for this.
What we really need before the data taking start-up is the automation of the rootfiles transfer on EOS (that was a part of the request), given that it is presently done by hand and we have other services that read from there.
So I would put this as top priority. Off course, we can still do the upload by hand, but given the amount of incoming files, this could become a nightmare for us. For the upload of the rootfiles in the new GUI, instead, we can survive like this for a couple of months indeed (we have the old GUI that is working). Now I see here: https://github.com/dmwm/WMCore/pull/11015 that the transfer to EOS is failing still due to mapping. Is it possible to decouple the two parts of the problem (i.e. transfer to eos and rootfile upload in the GUI) ?
@rvenditti Thank you for the update!
Regarding the EOS transfer in #11015, this is working.
Here is a status summary from the WMCore side of things:
So, basically we are just waiting for the registration mapping to work. And yes please, a time estimate of when this would be done (the host services in the new cmsweb urls) would be great.
Hi @khurtado I would expect to finish the testbed deployment sometime this week, so I would say that at the end of next week the production endpoints will be available.
Hi @micsucmed . Is there an update on this? I still see e.g.: this service link unavailable:
@micsucmed Just pinging about this issue.
@khurtado sorry for the delay. The EOS issue has been fixed yesterday, it was fixed a couple weeks ago but a n update to the Kubernetes cluster reinstated the issue. I will deploy the testbed again and prepare for production deployment. At the end of this week or early next week I expect to have the production deployment ready for you to continue. Again sorry for the delays and my late response.
Hi @micsucmed. Thank you! Sorry I took 2 weeks to reply back, but I hope the deployment plans are going well. Please let me know once the services have been deployed so we can re-test this.
Hi @micsucmed. Any news on this?
@micsucmed @rvenditti Pinging about this again.
Hi @khurtado , I understood for @micsucmed that the update of the offline GUI to k8 is done, but there are some problems on the cmsweb side. I don't have the details, but i think that problem can be solved in the time scale of some days. @micsucmed can you confirm?
Hi @khurtado, as @rvenditti say's there is some issue related to the frontend rules given by cmsweb so that the URL in production ( https://cmsweb.cern.ch/dqm/offline-new/ ) access the deployed pod in K8's. The issue is being solved by cmsweb team. I would like to think it's a simple issue that should be solved soon, nonetheless, as I am not the one with access to solving it I am not sure if this will be the case.
@rvenditti @micsucmed Thank you for the update. Is there a ticket or GH issue to track this from the cmsweb team side?
@rvenditti how should we (WM Core) proceed on this? Are we supposed to continue working on this, considering that the new DQM GUI is already in place? @khurtado are there any other tests to be done before merging and deploying?
Impact of the new feature This request affects all systems that are responsible for harvested DQM data being uploaded to the DQM GUIs. This includes T0 processed DQM data and RelVal/reprocessing DQM data.
Is your feature request related to a problem? Please describe. We're deploying a new, upgraded version of the DQM GUI tool. The procedure which notifies the DQM GUI about the new DQM files is different in a new version. We would like this new procedure to be used along side the the old,
visDQMUpload
based DQM file upload.Describe the solution you'd like Now, the DQM data is uploaded to the DQM GUIs using a tool called
visDQMUpload
. New procedure requires this process to be split into two stages:If required, a facade could be provided by us (DQM) that would have exactly the same interface as
visDQMUpload
. In such case, we would only like you to call the facade script (visDQMUpload_new
) alongside the old one.Describe alternatives you've considered No viable, future proof alternatives were found.
Additional context Bellow is a diagram that represents the current Offline DQM file movement:
Bellow is a diagram that represents the desired Offline DQM file movement, after the changes mentioned in this request: