dmwm / DAS

Data Aggregation System
11 stars 7 forks source link

DAS does not record MinBias dataset used for pileup #4194

Closed tomalin closed 6 years ago

tomalin commented 10 years ago

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin

vkuznet commented 10 years ago

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

tomalin commented 10 years ago

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.

vkuznet commented 10 years ago

Hi Ian Let's break your request into two independent pieces:

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

tomalin commented 10 years ago

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.

vkuznet commented 10 years ago

Ok, I see the point now. As I said before, users noticed that and requested to identify what config meant. It is not yet close to what you've asked, but at least DAS will be able to identify if config was used to produce a dataset or used with dataset to produce others. Next version of DAS will "recognize" this and will show input-config, output-config. Then I can look if I can identify CMSSW part of config (but unfortunately it should wait till Sept, since I'm leaving in a month for long vacation).

Regarding DBS3/ReqMgr records. DAS aks "every" CMS data-service if they know "anything" about given query. Therefore DAS can capture responses from different data-services. In this case both DBS3 and ReqMgr responded with different type of information. That is why both records are shown. Since there is no common key how these information can be aggregated DAS shows both records. If we'll find and store such common key, then DAS will be able to aggregate among different config informations.

Valentin.

On 0, Ian Tomalin notifications@github.com wrote:

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

  • DAS config representation
  • config content.

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45622341

vkuznet commented 10 years ago

Ian, I'm looking further into actual config files associated with

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

query. The direct links are:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

My understanding that both of these configs were used to produce this dataset. Now the question is how program can identify which one corresponds to CMSSW .cfg file? Do you have any suggestions/recipes? From DAS point of view it is just some content. I got it via ReqMgr APIs where I supplied dataset name. In other words this what ReqMgr returns to DAS and now as a user you want DAS to show only CMSSW one. How DAS will know that? First question are those configs are correct ones? If so, how we can identify CMSSW cfg file? If not then we need to comeback to data-ops and ask the question why they didn't put proper config associated with this dataset into ReqMgr database.

Please advise, Best, Valentin.

On 0, Ian Tomalin notifications@github.com wrote:

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

  • DAS config representation
  • config content.

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45622341

tomalin commented 10 years ago

Dear Valentin,

1) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile

2) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

Both (1) and (2) are both CMSSW .cfg files.

Personally, to figure out what they do, I look through them for the keyword “cms.Schedule”.

So for (1), the line containing “cms.Schedule” contains “process.reconstruction_step“, so I know it is doing reconstruction, and hence will produce some sort of “RECO” dataset.

And for (2), the line containing “cms.Schedule” contains “process.digitisation_step”, so I know it is doing simulation, and so making some sort of “SIM” dataset.

Naively therefore, I assume that (2) was used to produce =/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM .

Cheers,

Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 18:21 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, I'm looking further into actual config files associated with

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

query. The direct links are:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

My understanding that both of these configs were used to produce this dataset. Now the question is how program can identify which one corresponds to CMSSW .cfg file? Do you have any suggestions/recipes? From DAS point of view it is just some content. I got it via ReqMgr APIs where I supplied dataset name. In other words this what ReqMgr returns to DAS and now as a user you want DAS to show only CMSSW one. How DAS will know that? First question are those configs are correct ones? If so, how we can identify CMSSW cfg file? If not then we need to comeback to data-ops and ask the question why they didn't put proper config associated with this dataset into ReqMgr database.

Please advise, Best, Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

  • DAS config representation
  • config content.

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com>>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45622341

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45644401.

tomalin commented 10 years ago

Dear Valentin et al.,

I’ve taken a closer look at this problem. For MC datasets produced in the past few months, more information appears to be available in DAS. For example, at the top of the CMSSW configuration files shown by the DAS command:

config dataset=/Neutrino_Pt2to20_gun/TTI2023Upg14-DES23_62_V1-v1/GEN-SIM

the cmsDriver options used to produce the dataset are listed, and in particular, option “—pileup_input” correctly shows which MinBias dataset was used for the pileup.

Furthermore, the child of this dataset is:

dataset=/Neutrino_Pt2to20_gun/TTI2023Upg14D-PU140bx25_PH2_1K_FB_V3-v2/GEN-SIM-DIGI-RAW

The DAS entry for this contains an option “McM info”, which also lists the MinBias dataset used to produce it.

N.B. People using DAS have to know that all “McM info” is listed only under the DAS entry of the “child” datasets, whereas all the CMSSW config files are listed under the DAS entry of the “parent” dataset. This is highly non-obvious.

Unfortunately, this information is often missing for earlier datasets, such as those from the Summer12_DR53X campaign.

dataset=/HTo2LongLivedTo4F_MH-1000_MFF-350_CTau35To3500_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v1/GEN-SIM-RECODEBUG

Regards, Ian

From: Tomalin, Ian (STFC,RAL,PPD) Sent: 10 June 2014 19:45 To: 'dmwm/DAS'; dmwm/DAS Subject: RE: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Dear Valentin,

1) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile

2) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

Both (1) and (2) are both CMSSW .cfg files.

Personally, to figure out what they do, I look through them for the keyword “cms.Schedule”.

So for (1), the line containing “cms.Schedule” contains “process.reconstruction_step“, so I know it is doing reconstruction, and hence will produce some sort of “RECO” dataset.

And for (2), the line containing “cms.Schedule” contains “process.digitisation_step”, so I know it is doing simulation, and so making some sort of “SIM” dataset.

Naively therefore, I assume that (2) was used to produce =/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM .

Cheers,

Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 18:21 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, I'm looking further into actual config files associated with

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

query. The direct links are:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

My understanding that both of these configs were used to produce this dataset. Now the question is how program can identify which one corresponds to CMSSW .cfg file? Do you have any suggestions/recipes? From DAS point of view it is just some content. I got it via ReqMgr APIs where I supplied dataset name. In other words this what ReqMgr returns to DAS and now as a user you want DAS to show only CMSSW one. How DAS will know that? First question are those configs are correct ones? If so, how we can identify CMSSW cfg file? If not then we need to comeback to data-ops and ask the question why they didn't put proper config associated with this dataset into ReqMgr database.

Please advise, Best, Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

  • DAS config representation
  • config content.

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com>>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45622341

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45644401.

vkuznet commented 10 years ago

Ian, DAS do not and cannot control when and which information is stored in underlying data-services. Therefore as you observed it is possible that new dataset have more information since people realized that and start storing more meta-data.

Regarding MCM info. A few month ago I got request to add this into DAS. That's how "McM info" link appears. But more generally I added new DAS keyword "mcm" such that users can place queries like mcm dataset=/a/b/c. Again, MCM information highly depends on MCM data-services I don't control which dataset have it which are not. Your best bet is to try this query mcm dataset=/a/b/c and see if information is available for your dataset.

For instance,

mcm dataset=/Neutrino_Pt2to20_gun/TTI2023Upg14-DES23_62_V1-v1/GEN-SIM

produces the following output:

Mcm: L1T-2023TTIUpg14-00002 CMSSW release: CMSSW_6_2_0_SLHC11, Number of events: 285000, Physics group: L1T

the show link on DAS page will provide more information about it.

Since information is "barely new" in DAS I need user input to understand what should be shown/hidden upon such queries.

If you find it useful, please provide your feedback based on several queries. First you may find that different datasets have different MCM info, in this case I need to contact MCM data-service to clarify why this is the case.

Next, you can provide which attributes are useful to display for DAS records and I can adjust DAS UI to do that.

Valentin. P.S. I checked that both dataset you mentioned have MCM info.

On 0, Ian Tomalin notifications@github.com wrote:

Dear Valentin et al.,

I’ve taken a closer look at this problem. For MC datasets produced in the past few months, more information appears to be available in DAS. For example, at the top of the CMSSW configuration files shown by the DAS command:

config dataset=/Neutrino_Pt2to20_gun/TTI2023Upg14-DES23_62_V1-v1/GEN-SIM

the cmsDriver options used to produce the dataset are listed, and in particular, option “—pileup_input” correctly shows which MinBias dataset was used for the pileup.

Furthermore, the child of this dataset is:

dataset=/Neutrino_Pt2to20_gun/TTI2023Upg14D-PU140bx25_PH2_1K_FB_V3-v2/GEN-SIM-DIGI-RAW

The DAS entry for this contains an option “McM info”, which also lists the MinBias dataset used to produce it.

N.B. People using DAS have to know that all “McM info” is listed only under the DAS entry of the “child” datasets, whereas all the CMSSW config files are listed under the DAS entry of the “parent” dataset. This is highly non-obvious.

Unfortunately, this information is often missing for earlier datasets, such as those from the Summer12_DR53X campaign.

dataset=/HTo2LongLivedTo4F_MH-1000_MFF-350_CTau35To3500_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v1/GEN-SIM-RECODEBUG

Regards, Ian

From: Tomalin, Ian (STFC,RAL,PPD) Sent: 10 June 2014 19:45 To: 'dmwm/DAS'; dmwm/DAS Subject: RE: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Dear Valentin,

1) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile

2) https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

Both (1) and (2) are both CMSSW .cfg files.

Personally, to figure out what they do, I look through them for the keyword “cms.Schedule”.

So for (1), the line containing “cms.Schedule” contains “process.reconstruction_step“, so I know it is doing reconstruction, and hence will produce some sort of “RECO” dataset.

And for (2), the line containing “cms.Schedule” contains “process.digitisation_step”, so I know it is doing simulation, and so making some sort of “SIM” dataset.

Naively therefore, I assume that (2) was used to produce =/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM .

Cheers,

Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 18:21 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, I'm looking further into actual config files associated with

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

query. The direct links are:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d858/configFile https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

My understanding that both of these configs were used to produce this dataset. Now the question is how program can identify which one corresponds to CMSSW .cfg file? Do you have any suggestions/recipes? From DAS point of view it is just some content. I got it via ReqMgr APIs where I supplied dataset name. In other words this what ReqMgr returns to DAS and now as a user you want DAS to show only CMSSW one. How DAS will know that? First question are those configs are correct ones? If so, how we can identify CMSSW cfg file? If not then we need to comeback to data-ops and ask the question why they didn't put proper config associated with this dataset into ReqMgr database.

Please advise, Best, Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com> wrote:

Thanks Valentin! You’re very helpful as usual.

With regards to the DAS config representation, I personally would find things easiest to understand, if, when I ask DAS for:

“config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM”

it would show me ONLY the CMSSW .cfg file used to produce this dataset.

Instead, it currently shows two .cfg files, one of which corresponds to the “Reconstruction” step, so was never used to produce this SIM dataset.

(By the way, for non-experts, the options “dbs3 show” and “reqmgr show” are rather mysterious. - When I first tried using the “config dataset” option, I tried clicking on a couple of these things, didn’t understand what they produced, and gave up).

Cheers, Ian

From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 10 June 2014 14:36 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Hi Ian Let's break your request into two independent pieces:

  • DAS config representation
  • config content.

The former is shown as following:

config dataset=/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

Config: cmsRun Creation time: 2012-05-17 11:34:39, Global Tag: UNKNOWN, Pset hash: GIBBERISH, Release: CMSSW_5_2_5 Sources: dbs3 show

Config: ReqMgr Config urls: config-1, config-2 Sources: reqmgr show

So you don't really need to click on reqmgr show link since config-1, config-2 are represent links to actual configuration files. I'm not sure how to improve this further, except the fact that several users expressed desire to know if those configs were used for input to produce this dataset or this dataset was used with these configs to produce other samples. This will be fixed in next release. Meanwhile if you have concrete suggestion how to improve this please speak up. I don't think that DAS needs to capture actual config, since their content sometimes is large and it is even more obscure to represent it on a single web page.

The second item, config content, is out of control of DAS. DAS only asks for configs from ReqMgr data-service, what is stored in config files is not DAS business. This is what data-ops/mcm teams stores to ReqMgr and it is up to them to clarify the content of the configuration. Sorry, but I can't help here. If you're not satisfied with these content please send email to data ops HN and ask for clarification. Until information will be properly stored (to satisfy used needs) DAS can't do anything about it. It only asks for this information and do not responsible for its content.

Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:

Dear Valentin,

Apologies for my slow response!

I am interested in the officially produced dataset

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12_DR53X-DEBUG_PU_S10_START53_V7A-v2/AODSIM

According to DAS, this has parent:

/HTo2LongLivedTo4F_MH-1000_MFF-150_CTau10To1000_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

In the DAS entry for this, I click on “Configs --> Sources: reqmgr show” --> Config urls: config-2”

and see the CMSSW configuration file used for the digitisation:

https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/7f0c14c9dafd4753c37534803031d533/configFile

P.S. The way, the output returned by the “Configs” option is rather user-unfriendly. It took me a while to figure out that there was anything useful inside it).

I do see a “MixingModule” named process.mix inside this config file for superimposing pileup events. However, the “fileNames” specified inside it just refer to RelVal minimum bias samples. I don’t believe this. --- Official CMS MC production can’t use these very small RelVal samples for its pileup? So it’s still not obvious to me how to find out the MinBias samples really used.

Thanks, Ian From: Valentin Kuznetsov [mailto:notifications@github.com] Sent: 29 May 2014 19:31 To: dmwm/DAS Cc: Tomalin, Ian (STFC,RAL,PPD) Subject: Re: [DAS] DAS does not record MinBias dataset used for pileup (#4194)

Ian, are you talking about "discovery" dataset configuration which was used to produce dataset in question?

Can you look at this query output config dataset=/a/b/c for your dataset and see if this information is available. The config query in DAS will look-up information in ReqMgr data-services and if it is available you'll see what data-ops has been used for dataset in question. If you'll find a config and information will not be available then we need to address the question who should fill it.

Therefore, in order to show something we need to identify WHERE this information is stored, then we need to check if it is actually stored.

So, to answer your question and make some actions I need the following:

  • dataset name examples which you're interesting in
  • check if their configuration exists
  • check if input dataset in recorded in configuration file
  • list of steps which was used to produce this dataset (i.e. reverse of "discovery" steps, e.g. dataset request came to MCM group, MCM group requested dataset to process, it was placed to data-ops team, it was processed by data-ops, etc.)
  • if you'll do private production we need to understand who/how will meta-data about this production is recorded and where (in DBS/Phedex/MCM/ReqMgr data-services).

Once I understand steps involved in this I can work on "discovery" part. Valentin.

On 0, Ian Tomalin notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com>>> wrote:

Hello, For physics datasets in DAS, there seems to be no way to discover which minimum-bias dataset was used to provide the pileup events that were superimposed on the signal events during the digitization process.

Surely this information should be available, so one can check how a specific dataset was made ? (We want to do a private production of more MC from an official production run, and want to be sure that we are using the same minimum bias dataset).

Thanks, Ian Tomalin


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-44566963.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45602720

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45613688.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-45622341

— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/DAS/issues/4194#issuecomment-45644401.


Reply to this email directly or view it on GitHub: https://github.com/dmwm/DAS/issues/4194#issuecomment-46449295