make a T0-like relval for 2018 HI

slava77 commented 5 years ago

[as a follow up to the discussion in the joint ops meeting on Sep 21] please add a relval matrix workflow to be used for testing T0-like setup in offline environment and IBs. Initial setup can be based on the 2018 MD3 HI test runs. Once we start running, another or this setup can be updated to use actual data.

@franzoni

cmsbuild commented 5 years ago

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

slava77 commented 5 years ago

assign pdmv

cmsbuild commented 5 years ago

New categories assigned: pdmv

@prebello,@pgunnell,@zhenhu you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 commented 5 years ago

the best setup for this would be to use the files with data content before repacking of the FED data so that this relval test also includes the reHLT part as well

zhenhu commented 5 years ago

Hi @slava77 , could you please give us a bit more information about this workflow? such as the input, the era, the conditions, etc.

slava77 commented 5 years ago

assign alca,hlt

I'm adding AlCa to the thread to provide inputs on the conditions and perhaps ALCA parts of the workflows. I'm also adding HLT to suggest a configuration.

I think that a fraction of this workflow will be like wf 140.55, only using Run2_2018_pp_on_AA era and scenario --pp

cmsbuild commented 5 years ago

New categories assigned: hlt,alca

@lpernie,@franzoni,@pohsun,@tocheng,@Martin-Grunewald,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 commented 5 years ago

@mandrenguyen @icali please follow and/or advise here as well in order to get the right inputs

zhenhu commented 5 years ago

I chatted with @mmusich and got some receipt on how to modify wf 140.55.

We will update the input dataset with the MD3 data
We need a menu from HLT which does the hybrid ZS + repacking (something similar to HYBRIDZSHI2015)
The rest steps in 140.55, such as 'RECOHID15','HARVESTDHI15', can be reused. But we need to change the conditions for 2018 detector. What we are missing is mainly the 2nd item above. Please let me know if you have any comments.

icali commented 5 years ago

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

slava77 commented 5 years ago

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content. Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name). If so, the T0 configuration will just need to deal with a different FED collection name pickup.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions

instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?
in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.
- recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector
- similar to this, one can add --reduced flag and copy-paste/edit the implementation https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/Options.py#L285-L289 and https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/ConfigBuilder.py#L2219-L2223 with MassReplaceInputTag(process, new="rawDataReducer")

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data. Option 2. is more practical.

Dr15Jones commented 5 years ago

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

icali commented 5 years ago

adding @FHead and @stahlleiton that are going to implement the HLT menu.

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name. Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content. Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name). If so, the T0 configuration will just need to deal with a different FED collection name pickup.

Yes, there is going a one-to-one mapping of PDs to output FED collection. In order to close the loop, I would propose the following naming convention:

PDs without the HI in the name (standard pp name): rawDataCollector
PDs with HI in the name and without the mention ReducedFormat: rawDataRepacker
PDs with HI in the name and with the mention ReducedFormat: rawDataReducedFormat (I removed to Repacker from the previous proposed name to shorten the name)

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions
instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?
in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.
recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector
similar to this, one can add --reduced flag and copy-paste/edit the implementation

cmssw/Configuration/Applications/python/Options.py
 Lines 285 to 289
in 1cd19ae
    expertSettings.add_option("--repacked", 

                              help="When the input file is a file with repacked raw data with label rawDataRepacker", 

                              action="store_true", 

                              default=False, 

                              dest="isRepacked" 
and

cmssw/Configuration/Applications/python/ConfigBuilder.py
 Lines 2219 to 2223
in 1cd19ae
    if self._options.isRepacked: 

        self.pythonCfgCode +="\n" 

        self.pythonCfgCode +="from Configuration.Applications.ConfigBuilder import MassReplaceInputTag\n" 

        self.pythonCfgCode +="MassReplaceInputTag(process)\n" 

        MassReplaceInputTag(self.process) 
with MassReplaceInputTag(process, new="rawDataReducer")
My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data. Option 2. is more practical.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation. However, if it is less "safe", lets go with option 2. Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing. Thank you again!

slava77 commented 5 years ago

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file? Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

slava77 commented 5 years ago

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

these look OK to me

Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing.

I think this will work even better, it just needs a bit more creative coding (not just copy-paste/replace that I proposed now).

slava77 commented 5 years ago

[in the T0 repacking step] 1. instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation.

@drkovalskyi @hufnagel may want to comment on feasibility of this request for the T0 developments (to be delivered in ~4 weeks).

fwyzard commented 5 years ago

On Fri, 21 Sep 2018, 22:50 Slava Krutelyov, notifications@github.com wrote:

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.

Doesn't it "just" require 3 different repacking configurations, and a mapping between dataset name and which configuration to use?

Option 2. is more practical.

Actually, Option 2. pushes the complexity to all present and future consumers of these data. It is more practical only for the person that would have to implement Option 1.

slava77 commented 5 years ago

On 9/22/18 7:01 AM, Andrea Bocci wrote:

On Fri, 21 Sep 2018, 22:50 Slava Krutelyov, notifications@github.com wrote:

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.

Doesn't it "just" require 3 different repacking configurations, and a mapping between dataset name and which configuration to use?

Yes, it's not that complicated. But making mistakes in the repacking step are particularly dangerous. So, debugging/validation requires extreme care.

Option 2. is more practical.

Actually, Option 2. pushes the complexity to all present and future consumers of these data. It is more practical only for the person that would have to implement Option 1.

You are correct in the long run. My assessment was considering the delivery time of just about 4 weeks

hufnagel commented 5 years ago

Repack configurations are generated in

Configuration.DataProcessing.Repack

and we would need a tweak there to convert the data products somehow and generate the correct output. I don't see how producing that tweak and testing it standalone falls anywhere under Tier0 development.

Once this configuration tweak has been produced and is integrated into Config.DP, this would turn into a Tier0 testing/validation issue though. But I don't see that as a big problem assuming the previous standalone testing was thorough.

davidlange6 commented 5 years ago

eg, the tier0 work is to create a matrix of configurations for repack just as there is for prompt (which does go wrong from time to time..) - how much time do people have to notice repack errors before data falls on the floor?

On Sep 23, 2018, at 3:15 AM, Dirk Hufnagel notifications@github.com wrote:

Repack configurations are generated in

Configuration.DataProcessing.Repack

and we would need a tweak there to convert the data products somehow and generate the correct output. I don't see how producing that tweak and testing it standalone falls anywhere under Tier0 development.

Once this configuration tweak has been produced and is integrated into Config.DP, this would turn into a Tier0 testing/validation issue though. But I don't see that a big problem assuming the previous standalone testing was thorough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hufnagel commented 5 years ago

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

Next level would be passing a parameter to Configuration.DataProcessing.Repack that configures whether or not we get a standard repack or this new thing. Yeah, that would need Tier0 development work.

Repack errors that crash CMSSW cause paused jobs which block streamer deletions. As long as we don't run out of space at P5 it even blocks streamer deletion there. A repack error due to bad configuration almost certainly isn't recoverable within the same Tier0 instance though. You are talking about having to do recovery replays here.

The real problematic case is repack errors that don't crash CMSSW. You have 7 days to notice that normally, but if we are very busy it could be less (since we delete streamers more aggressively then).

fwyzard commented 5 years ago

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

I would say because you need to look at the data to figure out what to do, and you cannot do that at configuration level.

hufnagel commented 5 years ago

Frankly, the only piece of the Tier0 that looks at the data are the CMSSW jobs. Nothing else cares about how the 0's and 1's are organized in the data files.

So how would this work then ? We trust that HLT puts such data into a special stream and we configure that stream to be repacked in a special way ?

Either way, before there can be any Tier0 development here, someone needs to create a valid repack configuration for this data. That repack configuration needs to be tested standalone and then needs to be integrated into Config.DP.Repack (add a parameter to activate it that is false by default for instance).

Once all of that is in place, doing the Tier0 development work to create a config flag to enable this for a stream and use it shouldn't be that much work. Few days for the code changes, longer for the Tier0Ops validation/testing (could be much longer if the Config.DP.Repack changes weren't done correctly before).

fwyzard commented 5 years ago

On 24 September 2018 at 00:31, Dirk Hufnagel notifications@github.com wrote:

So how would this work then ? We trust that HLT puts such data into a special stream and we configure that stream to be repacked in a special way ?

Yes, one would map different streams to different job configurations.

Either way, before there can be any Tier0 development here, someone needs

to create a valid repack configuration for this data. That repack configuration needs to be tested standalone and then needs to be integrated into Config.DP.Repack (add a parameter to activate it that is false by default for instance).

True, except I would say that the Tier-0 development and validation can happen in parallel to the development of the different repacking configurations.

hufnagel commented 5 years ago

True, except I would say that the Tier-0 development and validation can happen in parallel to the development of the different repacking configurations.

The part that can happen in parallel consists of adding a dozen code lines spread across a few places. Which is not difficult assuming you know what these few places are (which I think I do).

The vast majority of the Tier0 development will consist of running this against a new CMSSW release with various stream configurations to make sure it works correctly (extracting and looking at the generated repack configurations). It's a bit pointless starting any of this without having a CMSSW release with the Config.DP.Repack changes in place.

davidlange6 commented 5 years ago

On Sep 23, 2018, at 6:35 PM, Andrea Bocci notifications@github.com wrote:

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

I would say because you need to look at the data to figure out what to do, and you cannot do that at configuration level.

I'm not sure why it would be so hard to have a module look for one of N collections and use it? It does expand "replacing" to be more than just repacking, but that is the case regardless in this thread. [and the process name of the FEDRawData in raw files would change at a minimum.. but again thats likely true in any solution downstream of the HLT]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fwyzard commented 5 years ago

I'm not sure why it would be so hard to have a module look for one of N collections and use it?

Mhm, yes, I think that technically it would work. I don't know if the existing raw data collector module can cope with missing input collections, but it should be a simple extension.

It does expand "replacing" to be more than just repacking, but that is the

case regardless in this thread.

[and the process name of the FEDRawData in raw files would change at a

minimum.. but again thats likely true in any solution downstream of the HLT]

Actually, if we choose what to run at configuration level, we can keep the original process name (LHC or HLT) for all the cases where the full raw data is available, and the renaming is not needed.

And now I'm thinking, what if we extend the raw data collector module, or we put an EDFilter in front it, so that the renaming happens only if the data is found to have the "skimmed" name ?

As long as the HLT is correctly configured to send a single raw data collection (original, zero-suppressed, or skimmed), it may work with a single repacking configuration.

davidlange6 commented 5 years ago

On Sep 24, 2018, at 8:42 AM, Andrea Bocci notifications@github.com wrote:

And now I'm thinking, what if we extend the raw data collector module, or we put an EDFilter in front it, so that the renaming happens only if the data is found to have the "skimmed" name ?

its possible (as you would know better than me..) - but then you give up the consistent raw data format across data sets

fwyzard commented 5 years ago

Yes, but the inconsistency would only be in the "process name" part of the collections, which most configurations ignore anyway.

So we should be able to run the same downstream configuration on all the inputs, and still have a simple way to check from the data what one is running on (and differentiate if needed).

icali commented 5 years ago

Thank you for all the input. Not an easy decision between 1 and 2. We should consider also that the same running mode will happen also during run 3 so what is decided now will be kept for the next 4 HI runs. In this optic I'm still in favor of option 1 also if more complex to implement. As HI we can inject some manpower and update/test locally the Config.DP.Repack but we would need some guidance.

However, what is not clear to me is how it is possible to have the failure mode in which the repacking jobs are not crashing but the data results to be corrupted. The majority of our data will indeed not be reconstructed immediately. A subtile failure in the RAW data repacking would be spot only when the streamers file will not be available anymore.

Thinking out laud, do you think that option 3 could be feasible? The option 3 would be implement option 2 and include in the reco sequence a raw skimming configuration that takes rawDataReducedFormat and produces rawDataRepacker. Only the rawDataRepacker will go to tape while the original data will be delete.

davidlange6 commented 5 years ago

On Sep 25, 2018, at 11:31 AM, Ivan Amos Cali notifications@github.com wrote:

Thank you for all the input. Not an easy decision between 1 and 2. We should consider also that the same running mode will happen also during run 3 so what is decided now will be kept for the next 4 HI runs. In this optic I'm still in favor of option 1 also if more complex to implement. As HI we can inject some manpower and update/test locally the Config.DP.Repack but we would need some guidance.

However, what is not clear to me is how it is possible to have the failure mode in which the repacking jobs are not crashing but the data results to be corrupted. The majority of our data will indeed not be reconstructed immediately. A subtile failure in the RAW data repacking would be spot only when the streamers file will not be available anymore.

right - that is the risk of introducing complexities into the repack. Its not corrupted that I worry about, its dropped data products due to a job misconfiguration.

Thinking out laud, do you think that option 3 could be feasible? The option 3 would be implement option 2 and include in the reco sequence a raw skimming configuration that takes rawDataReducedFormat and produces rawDataRepacker. Only the rawDataRepacker will go to tape while the original data will be delete.

waiting until prompt is done to archive to tape has a much higher risk of losing the raw data.. (especially given your comments above)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 commented 5 years ago

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file? Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

@Dr15Jones please comment if this is a possibility from the framework/edm side.

Dr15Jones commented 5 years ago

@slava77 it is not possible.

slava77 commented 5 years ago

Based on the inputs so far, I've been thinking of still using options "2" (editing only the reco/processing step). One possible option here is to modify the RAW2DIGI step, or make a new one, say RAWS2RAW and have it in the standard processing for everything. This way we just do the collection renaming in the same process, not splitting it to multiple processes.

The implementation will be a modified version of RawDataCollectorByLabel. For a standard rawDataCollector FED collection name

it will have its instance rawDataCollector
in the configuration it will have a list of alternative collections to pick, all required to skip the current process name.
- By implementation only the first available collection can be used and if more than one is available in the first or any later event, there will be an exception
- Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step
as an optimization option, if the input has rawDataCollector , this producer doesn't write anything to the event so that the downstream picks up this collection from the inputs.

davidlange6 commented 5 years ago

On Sep 26, 2018, at 2:41 PM, Slava Krutelyov notifications@github.com wrote:

Based on the inputs so far, I've been thinking of still using options "2" (editing only the reco/processing step). One possible option here is to modify the RAW2DIGI step, or make a new one, say RAWS2RAW and have it in the standard processing for everything. This way we just do the collection renaming in the same process, not splitting it to multiple processes.

The implementation will be a modified version of RawDataCollectorByLabel. For a standard rawDataCollector FED collection name

• it will have its instance rawDataCollector • in the configuration it will have a list of alternative collections to pick, all required to skip the current process name. • By implementation only the first available collection can be used and if more than one is available in the first or any later event, there will be an exception • Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step • as an optimization option, if the input has rawDataCollector , this producer doesn't write anything to the event so that the downstream picks up this collection from the inputs.

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 commented 5 years ago

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

If the optimization option is in place (the standard collection is not copied), the overhead is minimal.
in all other cases the overhead in making a copy is perhaps unavoidable and the memory use can improve by using the early deletion.

A solution with a smaller overhead can include switching to producing only a "pointer" to the right FED collection. This will need one time modification to the algorithms consuming the FED collection.

davidlange6 commented 5 years ago

On Sep 26, 2018, at 3:53 PM, Slava Krutelyov notifications@github.com wrote:

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

• If the optimization option is in place (the standard collection is not copied), the overhead is minimal. • in all other cases the overhead in making a copy is perhaps unavoidable and the memory use can improve by using the early deletion. A solution with a smaller overhead can include switching to producing only a "pointer" to the right FED collection. This will need one time modification to the algorithms consuming the FED collection.

one of the L1 pull requests merged today had the same problem.. there I imagine the overhead is low just to ignore..Thinking outloud, putting an InputTag into the event that identifies the "right" FED collection would be less ugly than a pointer-like interface.. [but the only good thing about these solutions is that since they are not in the repack step, they are less likely to cause data loss if buggy]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

icali commented 5 years ago

it is true that using a module running all the time and being part of the standard rawtodigi as a rawtoraw will make it transparent from having datasets with different collection name and it could be extendible without issues and no-one should care anymore about HI or pp data.

Just a naive question, why this mechanism cannot simply be implemented in the flag mechanism. Now we have --data and --repacked. In a previous discussion in this thread it was proposed a way to adjust the --repacked flag to have 2 input collections. Wouldn't be enough to adjust/update the --data flag allowing to understand the tree collections name?

As announcement, we will have a slot in the joint meeting tomorrow to discuss the issue. If we have a sufficient critical mass, it would be very useful.

franzoni commented 5 years ago

Greetings,

what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data: i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration (cmsDriver does not detect primary dataset name in input).

@slava77 , when do we actually collect RAW data w/ this feature:

Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step

, if ever?

davidlange6 commented 5 years ago

On Sep 27, 2018, at 7:02 PM, Ivan Amos Cali notifications@github.com wrote:

it is true that using a module running all the time and being part of the standard rawtodigi as a rawtoraw will make it transparent from having datasets with different collection name and it could be extendible without issues and no-one should care anymore about HI or pp data.

well for those using cmsDriver driven configurations at least.. other configs will need adjusting.

Just a naive question, why this mechanism cannot simply be implemented in the flag mechanism. Now we have --data and --repacked. In a previous discussion in this thread it was proposed a way to adjust the --repacked flag to have 2 input collections. Wouldn't be enough to adjust/update the --data flag allowing to understand the tree collections name?

that is to say to have 3 different config data processing scenarios, picking the right one in the tier0 configuration.. That has been generally disfavored given its additional complexity

As announcement, we will have a slot in the joint meeting tomorrow to discuss the issue. If we have a sufficient critical mass, it would be very useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 commented 5 years ago

On Sep 28, 2018, at 12:39 PM, Giovanni Franzoni notifications@github.com wrote:

Greetings,

what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data: i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration (cmsDriver does not detect primary dataset name in input).

@slava77 , when do we actually collect RAW data w/ this feature:

Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step

, if ever?

@slava77 is correctly anticipating future (even if currently thought as crazy) scenarios..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 commented 5 years ago

On 9/28/18 3:59 AM, David Lange wrote:

On Sep 28, 2018, at 12:39 PM, Giovanni Franzoni notifications@github.com wrote:

Greetings,

what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data: i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration (cmsDriver does not detect primary dataset name in input).

@slava77 , when do we actually collect RAW data w/ this feature:

Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step

, if ever?

@slava77 is correctly anticipating future (even if currently thought as crazy) scenarios..

I was not thinking of just the future. I think that we have data like this already, although I don't have an example at hand. I would like to be proven wrong though.

fabiocos commented 5 years ago

@slava77 @icali @mandrenguyen I wonder whether it would make sense to replace the initial test input data for workflows 140.56 and 140.57 with the first collision data now that they arrive

mandrenguyen commented 5 years ago

@fabiocos Good call. I believe the following datasets would be a good choice: /HIHardProbes/HIRun2018A-v1/RAW /HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW It looks like for run 326383 all detectors were on for all LS. I believe the policy is to remove the RAW from disk shortly after prompt reco is done. Should we copy the relevant files to the CERN T2?

mandrenguyen commented 5 years ago

On closer inspection the tracker was off in 326383 after lumi 243, so the last 20 LS of the run. So we should either add a lumi mask or consider another run. Another possibility is 326479 which looks like a short run of 23 LS, where all detectors were on the whole time.

prebello commented 5 years ago

hi @mandrenguyen, for confirmation, may we enable 140.56 and 140.57 wfs to use /HIHardProbes/HIRun2018A-v1/RAW /HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW, respectively, in the run 326479 with LS [1,23] ?

mandrenguyen commented 5 years ago

Hi @prebello Yes !

slava77 commented 5 years ago

it looks like this issue can be closed

cms-sw / cmssw

make a T0-like relval for 2018 HI #24619