dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Pass PhEDEx metadata to FTS tasks for hadoop monitoring. #1085

Open nataliaratnikova opened 7 years ago

nataliaratnikova commented 7 years ago

Valentin requests that PhEDEx metadata are propagated to FTS. The FTS logs will appear on HDFS, then he can run spark job over FTS logs to extract some context for the global transfers monitoring.

Here are the metadata Valentin is interested in:

How such metadata should be structure is a subject of FTS metadata format/schema. I can't tell you more and you'll need to coordinate with FTS developers. It would be nice to use the same schema among different CMS FTS users, e.g. PhEDEx and ASO. This is why I CC'ed Diego since according to him and, in fact, the FTS logs already carry ASO metadata, e.g. job_metadata.issuer=ASO.

Since we're talking about meta-data its structure may change/adjusted over time.

Here are typical examples:

nataliaratnikova commented 7 years ago

See a similar opened issue : https://github.com/dmwm/PHEDEX/issues/1041

nataliaratnikova commented 7 years ago

ASO implementation of Job metadata passed to FTS :

https://github.com/dmwm/AsyncStageout/blob/master/src/python/AsyncStageOut/TransferWorker.py#L417

"job_metadata": {"issuer": "ASO","user": self.user } 
vkuznet commented 7 years ago

I'll advise to extend this info more, it is better to use DN instead of user name. Also add timestamp and application name/versions.

On 0, nataliaratnikova notifications@github.com wrote:

ASO implementation of Job metadata passed to FTS :

https://github.com/dmwm/AsyncStageout/blob/master/src/python/AsyncStageOut/TransferWorker.py#L417

"job_metadata": {"issuer": "ASO","user": self.user } 

-- You are receiving this because you were assigned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-300189493

vkuznet commented 7 years ago

Natalia, does PhEDEx db itself record information about application and DN who requested the transfer? If so, which table holds this info and do you have any API for access this info?

nataliaratnikova commented 7 years ago

yes, the DB has the information on who submitted the transfer request and who decided on the approval, have a look at https://cmsweb.cern.ch/phedex/datasvc/doc/transferrequests API. However the FTS submission is done by the site FileDownload agent with whatever proxy they specify in configuration. The FileDownload agent does not care who requested the files.

alberto-sanchez commented 7 years ago

Have address the extra info in the json file. This open the possibility of more option. Meanwhile I have tried to address the basic request of Valentin in order to distinguish the transfers from PHEDEX in his monitoring. If you want to test it, before a new release happen. just patch you perl_lib/PHEDEX/Transfer/Backend/Job.pm file.

dciangot commented 7 years ago

Hi all, sorry for the delay, but I forgot to subscribe this issue :/

Btw, summarizing on ASO side in order to be compliant with what proposed above we may need to add: "dn": /abc/abc/..., "time": 12315, request:{ "source": T.... "destination": T... } IMO it probably doesn't make to much sense for CRAB case the field "dataset", while for example we may go for "taskname": "1231231_123123:user_taskname". What do you think?

While, @vkuznet , by application name/version you meant ASO or FTS client?

belforte commented 7 years ago

I guess we also need to be consistent with what WMA/CRAB will report to ES via WMArchice and HTCondor classAd feeding. So that we can e.g. find both jobs and transfers for a given user acitvity. See: https://github.com/bbockelm/cms-htcondor-es/blob/master/README.md

belforte commented 7 years ago

given that issuer will be used as high level identifier to flag among different activities maybe we could just say DDM rather than PHEDEX/DDM and I would call CMS-user what was proposed as PHEDEX/user In a way, while new user are expected to use PHEDEX to submit a transfer request and have a data manager approve it, we may imagine in the future that another client becomes available to users for asking FTS transfers (they can aldready do, of course, but clearly if we make it a bit more convenient they may do more). Maybe even foresee "issuer":"CMS-group" and use the username field to indicate a group ?

Not that I like to make things complex and vague, but IIUC this naming schema is the fundation of monitoring work for next N years, may not want to hurry too much to a conclusion.

vkuznet commented 7 years ago

On 0, dciangot notifications@github.com wrote:

Hi all, sorry for the delay, but I forgot to subscribe this issue :/

Btw, summarizing on ASO side in order to be compliant with what proposed above we may need to add: "dn": /abc/abc/..., "time": 12315, request:{ "source": T.... "destination": T... } IMO it probably doesn't make to much sense for CRAB case the field "dataset", while for example we may go for "taskname": "1231231_123123:user_taskname". What do you think?

do you have standard convention for task names? What are the integer fields means, why your example contains semicolon. These and other type of question will be important at parsing level, so we better to standardize them.

While, @vkuznet , by application name/version you meant ASO or FTS client?

it is a tricky question. In case of PhEDEx we have both PhEDEx version and underlying middleware one. And, we should capture both. "client": { "phedex": bla, "fts": bla, another_middleware... }

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-302621561

belforte commented 7 years ago

about task names, let's refer to the documentation for ES which Brian wrote and I pointed to earlier. I am all for collecting all such descriptions in a single place, but not this issue.

dciangot commented 7 years ago

The taskname is in format YYMMDDhhmmss:

In any case please find below a proposed schema slightly different from one at the beginning of the thread:

{ "issuer": "ASO | PHEDEX/user | DDM | ...", "time": 123456, (need to specify common time zone) "client": { service:"AsyncStaseOut_v1.0.8 | PHEXED-client-v4", fts_client: ”blabla” } “user”: “username as from SiteDB”, "dn": "/my/DN/here", "request": { “workflow”: “belforte_crab_Pbp_HM185_250_q3_v3”, “CRAB_Workflow”: ”170406_201711:belforte_crab_Pbp_HM185_250_q3_v3”, "dataset": "/a/b/c", (may be empty for CRAB) "source": "T1_XXX", "destination": "T2_XXX" } }

vkuznet commented 7 years ago

Time should be always in seconds since epoch, then you don't really care about time zones. And, we may be more generic about middleware, e.g. instead of fts_client:bla we may say middleware:fts-version This type of generalization is better since it allows to keep intact schema keys when middleware client may change. You may also add metadata about phedex agent, e.g. agent: (such as host name|ip, etc.)

On 0, dciangot notifications@github.com wrote:

The taskname is in format YYMMDDhhmmss:

In any case please find below a proposed schema slightly different from one at the beginning of the thread:

{ "issuer": "ASO | PHEDEX/user | DDM | ...", "time": 123456, (need to specify common time zone) "client": { service:"AsyncStaseOut_v1.0.8 | PHEXED-client-v4", fts_client: ”blabla” } “user”: “username as from SiteDB”, "dn": "/my/DN/here", "request": { “workflow”: “belforte_crab_Pbp_HM185_250_q3_v3”, “CRAB_Workflow”: ”170406_201711:belforte_crab_Pbp_HM185_250_q3_v3”, "dataset": "/a/b/c", (may be empty for CRAB) "source": "T1_XXX", "destination": "T2_XXX" } }

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-302706385

vkuznet commented 7 years ago

Hi, I want to point out one important issue which may require slight change in this meta-data structure.

Quite often people who visualize data want to see aggregation on dataset sub-parts, e.g. data-tier. Therefore it will make more sense to replace dataset by three pieces: primds, procds, tier.

When you store these three pieces it would be easily to aggregate data let's say by data-tier or similar query. At the same time out of them it is easy to "compose" a dataset name since a dataset is just /primds/procds/tier

Best, Valentin.

On 0, dciangot notifications@github.com wrote:

The taskname is in format YYMMDDhhmmss:

In any case please find below a proposed schema slightly different from one at the beginning of the thread:

{ "issuer": "ASO | PHEDEX/user | DDM | ...", "time": 123456, (need to specify common time zone) "client": { service:"AsyncStaseOut_v1.0.8 | PHEXED-client-v4", fts_client: ”blabla” } “user”: “username as from SiteDB”, "dn": "/my/DN/here", "request": { “workflow”: “belforte_crab_Pbp_HM185_250_q3_v3”, “CRAB_Workflow”: ”170406_201711:belforte_crab_Pbp_HM185_250_q3_v3”, "dataset": "/a/b/c", (may be empty for CRAB) "source": "T1_XXX", "destination": "T2_XXX" } }

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-302706385

belforte commented 7 years ago

Hi @vkuznet, what if we report 4 things, the full dataset name and the three pieces, would that be a problem ? In the spirit that IIUC we are encouraging every operator, coordinator, user to build their favorite ES search of kibana/grafana dashboard, I would not mind looking also at "convenience".

vkuznet commented 7 years ago

I don't mind, but data-wise it is redundant. If user know the dataset (s)he may construct a filter simply using primds,procds,tier parts.

The only difference for users would be either use a string /a/b/c or parts.

But storage wise (if we do really care) additional strings (dataset) will add some overhead.

On 0, Stefano Belforte notifications@github.com wrote:

Hi @vkuznet, what if we report 4 things, the full dataset name and the three pieces, would that be a problem ? In the spirit that IIUC we are encouraging every operator, coordinator, user to build their favorite ES search of kibana/grafana dashboard, I would not mind looking also at "convenience".

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-303764910

alberto-sanchez commented 7 years ago

Hi @vkuznet, I wonder if you can see in your monitoring the fts job:

dc6a077e-4640-11e7-b2fa-a0369f23cf8e

or

52f63152-464a-11e7-873e-a0369f23cf8e

I wonder if the metadata I put in there is visible?. Or could you please share the way you look at this, so we can have a look as well.

vkuznet commented 7 years ago

Alberto, what are those values? Is it job_id?

I run Spark job over 20170531 and 20170601 dates and asked to find docs with job_id equal to values you posted, so far I found nothing.

But I'm not sure what I'm looking for and neither know which dates such document may belong to.

Please clarify.

For example here is how typical fts docs looks like: /afs/cern.ch/user/v/valya/public/DDP/fts.json

which has job_id == "10b4f232-aecd-5b51-b596-e943541159cb"

and I was able to find these type of jobs.

So I need to know the date to look at and confirmation that the values you posted are job_id or something else.

Best, Valentin.

On 0, Alberto Sanchez Hernandez notifications@github.com wrote:

Hi @vkuznet, I wonder if you can see in your monitoring the fts job:

dc6a077e-4640-11e7-b2fa-a0369f23cf8e

or

52f63152-464a-11e7-873e-a0369f23cf8e

I wonder if the metadata I put in there is visible?. Or could you please share the way you look at this, so we can have a look as well.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-305327297

alberto-sanchez commented 7 years ago

Hi Valentin yes, they are job_ids. They are from 2017-05-31

best regards

vkuznet commented 7 years ago

Alberto, in this case I can't find these jobs on HDFS. Valentin.

On 0, Alberto Sanchez Hernandez notifications@github.com wrote:

Hi Valentin yes, they are job_ids. They are from 2017-05-31

best regards

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-305833464

nataliaratnikova commented 7 years ago

Hi, I found both Alberto's job-ID's on FNAL fts server. I guess for the purpose of this test we need to submit to CERN FTS?

vkuznet commented 7 years ago

Yes, I'm looking at FTS logs at CERN HDFS.

On 0, nataliaratnikova notifications@github.com wrote:

Hi, I found both Alberto's job-ID's on FNAL fts server. I guess for the purpose of this test we need to submit to CERN FTS?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-305853962

alberto-sanchez commented 7 years ago

I have submitted to cern, the job_id is (few minutes ago)

f2857a08-47c0-11e7-9660-02163e018fe3

vkuznet commented 7 years ago

Alberto, I found few records with this job id, but when I look at metadata of particular record it is far from complete, i.e. what we discussed here, e.g. there is no request part, user_dn is empty, etc.

Here is a record I found:

{
  "activity": "PHEDEX",
  "block_size": 0,
  "buf_size": 0,
  "channel_type": "urlcopy",
  "chk_timeout": 0,
  "dest_srm_v": " ",
  "dst_hostname": "proton.fis.cinvestav.mx",
  "dst_se": "",
  "dst_site_name": "",
  "dst_url": "gsiftp://proton.fis.cin vestav.mx//meson/data/store/mc/RunIISummer16MiniAODv2/InclusiveBtoJpsitoMuMu_SoftQCDnonD_TuneCUEP8M1_wFilter_13TeV-pythia8- evtgen/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/120000/028FFD1E-1D15-E711-A6FA-008CFA0A58F8.root",
  "endpnt": "fts3.cern.ch",
  "f_size": 2207040911,
  "file_id": "1545492631",
  "file_size": 2207040911,
  "ipv6": false,
  "job _id": "f2857a08-47c0-11e7-9660-02163e018fe3",
  "job_metadata": {
    "client": "fts-client-3.6.8",
    "issuer": "PHEDEX",
    "time": "1495113054",
    "user": "phedex"
  },
  "job_state": "UNKNOWN",
  "log_link": "https://fts3.cern.ch:8449/fts3/ftsmon/#/f 2857a08-47c0-11e7-9660-02163e018fe3",
  "nstreams": 0,
  "remote_access": true,
  "retry": 0,
  "retry_max": 0,
  "src_hostname": " cmsdcatape01.fnal.gov",
  "src_se": "srm://cmsdcatape01.fnal.gov",
  "src_site_name": "",
  "src_srm_v": "2.2.0",
  "src_url ": "srm://cmsdcatape01.fnal.gov:8443/srm/managerv2?SFN=/11/store/mc/RunIISummer16MiniAODv2/InclusiveBtoJpsitoMuMu_SoftQCDn onD_TuneCUEP8M1_wFilter_13TeV-pythia8-evtgen/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/120000/028FF D1E-1D15-E711-A6FA-008CFA0A58F8.root",
  "srm_space_token_dst": "null",
  "srm_space_token_src": "",
  "t__error_message": "Destination file exists and overwrite is not enabled",
  "t_channel": "cmsdcatape01.fnal.gov__proton.fis.cinvestav.mx",
  "t_error_code": 17,
  "t_failure_phase": "TRANSFER_PREPARATION",
  "t_final_transfer_state": "Error",
  "t_final_transfer_stat e_flag": 0,
  "t_timeout": 3600,
  "tcp_buf_size": 0,
  "time_srm_fin_end": 0,
  "time_srm_fin_st": 0,
  "time_srm_prep_end": 0,
  "tim e_srm_prep_st": 0,
  "timestamp_checksum_dest_ended": 0,
  "timestamp_checksum_dest_st": 0,
  "timestamp_chk_src_ended": 0,
  "time stamp_chk_src_st": 0,
  "timestamp_tr_comp": 0,
  "timestamp_tr_st": 0,
  "tr_bt_transfered": 0,
  "tr_error_category": "FILE_EXISTS",
  "tr_error_scope": "DESTINATION",
  "tr_id": "2017-06-02-1841__cmsdcatape01.fnal.gov__proton.fis.cinvestav.mx__154549 2631__f2857a08-47c0-11e7-9660-02163e018fe3",
  "tr_timestamp_complete": 1496428892196,
  "tr_timestamp_start": 1496428886331,
  "user": "",
  "user_dn": "",
  "vo": "

On 0, Alberto Sanchez Hernandez notifications@github.com wrote:

I have submitted to cern, the job_id is (few minutes ago)

f2857a08-47c0-11e7-9660-02163e018fe3

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-305873828

alberto-sanchez commented 7 years ago

Hi Valentin, Thanks a lot for looking at this. Yes, the metadata maybe incomplete, but from what I understand it is what we can do now, before adjusting the schema of the DB. Natalia can further comment on this. The objetive of the test was just make sure if we are able to see phedex transfers.

nataliaratnikova commented 7 years ago

Hi All, in the following parameters found by Valentin: "job_metadata": { "client": "fts-client-3.6.8", "issuer": "PHEDEX", "time": "1495113054", "user": "phedex" },

"issuer": "PHEDEX"

"client": "fts-client-3.6.8"

"time": "1495113054"

"user": "phedex"

nataliaratnikova commented 7 years ago

Valentin, do you khow where this "activity" field value is coming from? :

{ "activity": "PHEDEX",

thanks, Natalia.

vkuznet commented 7 years ago

Natalia, I have no idea how/where and by whom meta-data attributes are filled out. My understanding that the docs are initiated at Phedex/clients, then they're probably wrapped by MONIT/kafka/etc., i.e. by tools where you sent the info. Best, Valentin.

On 0, nataliaratnikova notifications@github.com wrote:

Valentin, do you khow where this "activity" field value is coming from? :

{ "activity": "PHEDEX",

thanks, Natalia.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-306302170

vkuznet commented 7 years ago

Here is few comments from my side:

"issuer": "PHEDEX"

  • looks fine to me and is consistent both with ASO (see their code snippet above) and Atlas jobs: {"multi_sources": false, "issuer": "rucio"} .

it would be nice if you'll settle either use lower case convention or upper case one. As you pointed out ATLAS uses lower case, I don't remember how ASO filled out this part but you should be at least consistent.

"time": "1495113054"

  • need to clarify to which event this timestamp belongs. We could use name format similar to FTS fields: "tr_timestamp_complete": 1496428892196, "tr_timestamp_start": 1496428886331 .

time should be long data-type, not string, and, it should always be put as seconds since epoch, which can be converted to any other format.

"user": "phedex"

  • the name of the local user running the daemon does not have much value for transfers monitoring.. What people actually want to know the "activity", or group, for the requested transfer, see the related issue #1041. However, it will require substantial changes to the current TMDB schema and central agents to propagate this info down to the download agent, which actually submits the transfer jobs.

more important probably DN of the user rather then user name itself, since we can resolve from DN user attributes via SiteDB.

Best, Valentin.

vkuznet commented 6 years ago

@nataliaratnikova could you please update me where do you stand on this issue? DId you implement required features? Did you propagate them to agents? Did you verified that these features now appearts in FTS logs?

nataliaratnikova commented 6 years ago

@vkuznet The code is released, however we are not yet asking the sites to upgrade, because CERN reported high load on the servers when they upgraded to the new version of the agents, and this is not yet fully understood. OTOH I see T2_GR_Ioannina site has upgraded and is successfully using PHEDEX_4_2_2: both new features are propagated as expected, see e.g. b622c26c-bd83-11e7-bdb4-02163e01811c on CERN fts:

INFO    Mon Oct 30 16:05:26 2017; Job metadata: {\"client\":?\"fts-client-3.6.10\",?\"issuer\":?\"PHEDEX\"}
vkuznet commented 6 years ago

Natalia, thanks for update. I'll be glad if you'll notify me when all agents will be upgraded. I only need this to know that we can start doing analysis with FTS data on HDFS. Thanks, Valentin.

On 0, nataliaratnikova notifications@github.com wrote:

@vkuznet The code is released, however we are not yet asking the sites to upgrade, because CERN reported high load on the servers when they upgraded to the new version of the clients, and this is not yet fully understood. OTOH I see T2_GR_Ioannina site has upgraded and is successfully using PHEDEX_4_2_2: both new features are propagated as expected, see e.g. b622c26c-bd83-11e7-bdb4-02163e01811c on CERN fts:

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/PHEDEX/issues/1085#issuecomment-340477023

davidlange6 commented 6 years ago

hi all- i'm curious as to the status of this effort? It naively looks like few if any of the non-aso cms transfers have a job_metadata field (well, 0 of the first 1000 I looked at)

nataliaratnikova commented 6 years ago

Hi David, where did you look this up?

PhEDEx is sending the metadata starting from 4.2.2, see my previous post. Only a few sites upgraded to 4.2.2, but I can already see the stats for PhEDEx submitted transfers, and also Rucio transfers from our evaluation tests in the CERN Monit grafana dashboard:

https://monit-grafana.cern.ch/dashboard/db/fts-transfers-30-days?orgId=20&from=now-90d&to=now&var-group_by=activity&var-vo=cms&var-src_country=France&var-src_country=Italy&var-src_country=USA&var-dst_country=Belgium&var-dst_country=France&var-dst_country=Greece&var-dst_country=Italy&var-dst_country=USA&var-dst_country=unknown&var-src_site=All&var-dst_site=All&var-fts_server=cmsfts3.fnal.gov&var-fts_server=fts3-test.gridpp.rl.ac.uk&var-fts_server=fts3.cern.ch&var-bin=$__auto_interval

Natalia. PS. One can check sites upgrade progress via agents API: https://cmsweb.cern.ch/phedex/datasvc/perl/prod/agents?agent=FileDownload&version=PHEDEX_4_2_2

davidlange6 commented 6 years ago

hi @nataliaratnikova - a belated reply -

i'm looking at fts records in hadoop. Ones from atlas or from crab have useful metadata, ones from phedex do not. But I was looking at T2s and could easily have missed two(!!) of them that were using the new version. That is certainly far below a useful threshold for me. Is there a planned time scale for completing this?

nataliaratnikova commented 6 years ago

Hi @davidlange6, the development part is complete. For deployment I do not have any particular goal within PhEDEx project, as this is not for our internal use. If you have a dependent milestone, we can bring this up with the site support team and ask for their help with the upgrade.

davidlange6 commented 6 years ago

i'm not sure what a dependent milestone is, sorry - it would be great that this was all deployed before routine data taking starts this year...