dmwm / CMSRucio

7 stars 31 forks source link

Make Rucio aware of space used by /store/umerged #457

Open dciangot opened 1 year ago

dciangot commented 1 year ago

Without actually injecting all the files.

From discussion here: https://its.cern.ch/jira/browse/CMSTRANSF-553

@ericvaandering any idea?

klannon commented 1 year ago

Just to add some context: this was discussed in the last DM General meeting. The goal is to have some idea of the volume of unmerged files at a site so that site quotas for Rucio can be adjusted to account for it. The statement in the meeting was that feeding back the info to adjust the quotas once per week should be fine. Since the scanner is already feeding MSUnmerged a list of the unmerged files at all sites on a weekly basis, it seems like it would be an easy place to extract the data volumes and feed it to a service that adjusts the Rucio quota.

Note: although there has previously been requests to inject unmerged files into Rucio, from the discussion last week, it seems that no one is worried about locking or transferring unmerged files, just measuring them and making Rucio aware that some of the storage at a site is used up with these files.

ericvaandering commented 1 year ago

@ivmfnal is this something we can easily do? Or maybe we get the used space in the unmerged area already from the consistency check?

klannon commented 1 year ago

@ericvaandering @ivmfnal Just for clarification: In my mind, this seemed like a reasonable request assuming that we could piggy back on the service that feeds data to MSUnmerged for unmerged area cleanup. Unless I've gotten confused, there is a scanner-like service that is providing lists of unmerged files so that MSUnmerged can trigger cleanups, right? The question is could we use this to get an estimate of unmerged file size as well.

ivmfnal commented 1 year ago

Currently the "unmerged" scanner does collect total size under /store/unmerged minus couple ignored subdirectories (logs and SAM/testSRM). The total size can be exported via existing JSON interface on per-RSE basis.

ericvaandering commented 1 year ago

Then I guess we just need to sort out access issues for the web page (requires a CMS certificate, I think).

ivmfnal commented 1 year ago

Plus I need to make the total size available via web service interface. Currently it is shown by GUI only, which we do not want to scrape

In fact, it is already available, e.g.: https://cmsweb.cern.ch/rucioconmon/unmerged/stats?rse=T2_US_Caltech

Look for key "total_size_gb"

dynamic-entropy commented 1 year ago

Hi @ivmfnal Can you please provide me with the necessary instructions on how to interact with the endpoint to get the information about unmerged at each RSE?

What is the frequency of updates/recalculation of this data?

ivmfnal commented 1 year ago

@dynamic-entropy you interact with the endpoint by going to https://cmsweb.cern.ch/rucioconmon/unmerged/index and there you will find the link to combined stats for all RSEs and you can browse to specific RSE and run, e.g.: https://cmsweb.cern.ch/rucioconmon/unmerged/show_run?rse=T1_ES_PIC_Disk&run=2023_07_05_18_27 and then you click the link you need on that page.

dynamic-entropy commented 1 year ago

Hi @ivmfnal 😂 I don't think I meant how to browse the webpage.

We need to get this into rucio, so a http request to it would suffice. But I need to know the api schema (I thought this was obvious in the question, my bad). Also, will the rucio robot cert have permissions?

And more importantly the update frequency is what matters, if it is too slow, then we cannot use it or we need to still keep high safety margins, which is what we are trying to reduce.

ivmfnal commented 1 year ago

Update frequency, currently, is 1 per week per RSE.

There is no API and there is no API schema here. There are several published URLs one can use and they can be found by browsing the pages.

I will summarize them for you:

ericvaandering commented 1 year ago

Am I right that leaving off "run=" gives you the latest?

ivmfnal commented 1 year ago

Am I right that leaving off "run=" gives you the latest?

yes, exactly

ericvaandering commented 1 year ago

It seems to me we can close this or at least @dynamic-entropy should verify he can get the info he needs with the robot certificate and then we can close it.

dynamic-entropy commented 1 year ago

Hello, I am still not sure if weekly updates are good enough. I do not have usage and rate of change of usage of unmerged aread at various sites to conclude it.

https://monit-grafana.cern.ch/d/KahKLVA4k/eoscms-quotas-and-usage-monitoring-plots?from=now-6M&to=now&viewPanel=7&orgId=11&var-Pledge=45000&var-bin=7d This plot shows the used space of the unmerged area at CERN binned at weekly intervals.

If we have no other option than these weekly calculations, then we need to decide on a safe margin of tolerance we need to keep. As can be seen in the plot, we have had a jump of 330 TB. Is it representative of other sites too? How big a jump should I expect?

ivmfnal commented 1 year ago

In my opinion, if we want to go from weekly scans to daily scans, we absolutely have to:

  1. Remove empty directories on regular basis
  2. Allocate more resources (CPU, memory and network bandwidth) to the scanner. These 2 measures are required, but I can not guarantee they will be sufficient. Our bottleneck is the fact that a scan, depending on the site, can take anywhere from ~minutes to ~days to complete. And my impression is that our k8 pod where CE scanners run is already quite busy in terms of all the 3 resources I mentioned.

Another option would be for WM group to run the scanners using their own resources on their own schedule. We have made the scanner available as a pip-installable product, which can be used by anyone.

dynamic-entropy commented 1 year ago

I believe we should aim to remove (and prevent the generation of ) empty directories irrespective of this. Who are the concerned people we can talk to move this forward?

ericvaandering commented 1 year ago

One thing we can do is easily split the unmerged and normal scans into different pods. We could also stand up additional pods and put different sites into each of them (only T0/T1 for instance).

I have a new prod cluster which will be in production soon (next week?) and we have two additional nodes on it, so we have some more resources.

On Jul 19, 2023, at 9:01 AM, Igor Mandrichenko @.***> wrote:

In my opinion, if we want to go from weekly scans to daily scans, we absolutely have to:

Remove empty directories on regular basis Allocate more resources (CPU, memory and network bandwidth) to the scanner. These 2 measures are required, but I can not guarantee they will be sufficient. Our bottleneck is the fact that a scan, depending on the site, can take anywhere from ~minutes to ~days to complete. And my impression is that our k8 pod where CE scanners run is already quite busy in terms of all the 3 resources I mentioned. Another option would be for WM group to run the scanners using their own resources on their own schedule. We have made the scanner available as a pip-installable product, which can be used by anyone.

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_CMSRucio_issues_457-23issuecomment-2D1642147887&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=mdWLvNVbQNF-s7dFJ4y2U9NedcNtAFRf7MFRFb9EGj02WxhckZJrEVbDu93LXqqT&s=nvoJ6dm4uNodFAdNOSMbumKlhaZLjuXz9luN4A5A_Qg&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLSXAIPOGYAHB2SGCKLXQ7SD5ANCNFSM6AAAAAAWTWYM2I&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=mdWLvNVbQNF-s7dFJ4y2U9NedcNtAFRf7MFRFb9EGj02WxhckZJrEVbDu93LXqqT&s=99e8GHQrdbvK_Ckvo7vqAldDJixrigAIokg6aJAmReg&e=. You are receiving this because you were mentioned.

ivmfnal commented 1 year ago

I believe we should aim to remove (and prevent the generation of ) empty directories irrespective of this. Who are the concerned people we can talk to move this forward?

Yes, I agree, the empty directories affect CE performance too by slowing down the scanner.

Eric and I discussed this with Stephan, I think last Fall.

The reason why we can not remove empty directories as part of CE is that about half of the sites or so do not allow deletion operation over xrootd. Back then in Fall Stephan said we should wait for implementation of token authentication before sorting out the authorization issues. My understanding is that we are not there yet.

dynamic-entropy commented 1 year ago

Thank you, Eric, for sending out the mail. So, considering Alan's reply on the thread,

It looks like the service responsible for cleaning up unneeded unmerged files (called MSUnmerged) was not functional for the past (potentially 6) months; as a change of deployment from RPMs to PyPi.

This has been fixed last week and we expect both unneeded unmerged files and directories to be deleted now.

Even though the general problem of empty directories is not solved, as noted by Igor. For the purposes of this ticket, hopefully, we will be able to do fast runs of the unmerged directory, and then combined with Eric's idea of separating unmerged scans into a different pod, we can easily(?) get to daily scans or even higher frequency.

By the way, @ivmfnal, for my curiosity, what is the aim of the scan if CC does not work on unmerged directories? My intention is to know how heavy is the scan.

ivmfnal commented 1 year ago

Rahul,

CE scans entire LFN namespace under /store/ excluding few subdirectories (/store/unmerged is one of those excluded) to detect inconsistencies between actual storage contents and the replicas recorded in Rucio database.

The scan has to go through entire storage file tree, including empty directories. Depending on the site, the number of empty directories can be quite large. For example, this scan of T1_IT_CNAF_Disk detected 3014499 files in 1786256 directories, out of which 852164 were empty directories.

klannon commented 1 year ago

Just noticed these comments. Sorry for missing them last week. When this was originally discussed, my understanding was that weekly scans would be sufficient. (At least that's what I understood Stephan Lammel to be saying.) Are we sure that we need to achieve sufficient performance to do daily scans? Wouldn't it make sense to start with the current weekly frequency and only push for more performance once we establish that it's needed?

ivmfnal commented 1 year ago

I agree. I am not sure what exactly would be fundamentally different if the unmerged scans were done daily instead of weekly.

I think the space utilization is determined by how the files are processed/removed and does not actually depend on the frequency of scanning.

ericvaandering commented 1 year ago

The consistency checking has suffered a number of operational issues recently (certificates gone missing, move to a new cluster, and the restarts that go with that). I would propose that we let it run as is for a while (month or two) and then evaluation whether the week to week shifts we see are significant enough that finding day to day shifts would actually provide any benefit to operations.

For instance, if the outputs produced by jobs in the queue at any given time at a site is larger than the fluctuations in stored space, better knowledge of how much is on disk is not going to be helpful.

ivmfnal commented 1 year ago

As I understand, this request means that we need to get the size of entire /store/unmerged, without excluding those subdirectories ignored by the WM.

I am not sure it is actually practical to scan the entire subtree via xrootd just to get its size. More over, currently the k8 pod where CE and unmerged scanning are running is overloaded by these 2 activities and I am trying to find ways to reduce the load on the pod by:

Plus, there is a possibility to split the activities into multiple pods, which I am not actually sure will help because of the virtual nature of Kubernetes. But in any case, this can not be done before Eric comes back from vacation.

Another opportunity is to get rid of empty directories which may significantly reduce scanning time and the load on the system. See Issue 578.

ivmfnal commented 1 year ago

Frankly I am not sure what exactly I am supposed to do regarding this issue at this time.