Closed malbouis closed 1 year ago
A new Issue was created by @malbouis .
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
The issue is also appearing in other PDs: ReservedDoubleMuonLowMass
, Muon[0,1]
, DisplacedJet
and ZeroBias
I'm trying to take a look with a heap profiler (before that difficult to assing)
assign reconstruction, dqm
(before that difficult to assing)
Well, the cause is likely either in RECO or in DQM, so maybe useful to assign early anyway.
New categories assigned: dqm,reconstruction
@micsucmed,@rvenditti,@mandrenguyen,@emanueleusai,@syuvivida,@clacaputo,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks
Hi @cms-sw/reconstruction-l2 and @cms-sw/dqm-l2 , I know there are a lot of issues going on in this ramp-up period but did you have the chance to look into this issue? It should definitely be investigated as it showed up in the most recent replay we did and we are about to launch a new one to test CMSSW_13_0_5.
Here is a plot extracted from the log file showing the RSS as a function of the timestamp of the printout Towards the end of the job there is a rapid increase of ~4 GB, which then decreases.
While doing the plot I noticed the job processed 73 events on 8 threads, yet it took the job about an hour 15 minutes to process the data. End-of-job time report shows
Time Summary:
- Min event: 107.015
- Max event: 2101.15
- Avg event: 461.379
- Total loop: 4550.8
- Total init: 169.891
- Total job: 4739.79
- EventSetup Lock: 0
- EventSetup Get: 0
Event Throughput: 0.0160412 ev/s
CPU Summary:
- Total loop: 33277.5
- Total init: 155.67
- Total extra: 0
- Total children: 324.451
- Total job: 33446.1
i.e. the average time to process an even was ~7.5 minutes, with the maximum being 35 minutes! (even the minumum was almost 2 minutes)
Are these events particularly heavy? Or is there some runaway module?
Are these events particularly heavy?
On this line of though I see the log has 401 occurrences of
%MSG-e TooManyPairs: HitPairEDProducer:pixelPairStepHitDoublets 27-Apr-2023 20:25:56 CEST Run: 366498 Event: 277351
number of pairs exceed maximum, no pairs produced
%MSG
from pixelPairElectronHitDoublets
,pixelPairStepHitDoublets
, stripPairElectronHitDoublets
.
I believe the pixel-pair step comes into play when there's an inactive region of the pixel tracker. Could there be a larger than usual pixel dead area in these events? Maybe @mmusich or @slava77 have some insight? There are also plenty of warnings of the following type from the pixel-pair step track propagation:
%MSG-w BasicTrajectoryState: CkfTrackCandidateMaker:pixelPairStepTrackCandidates 01-May-2023 16:18:56 CEST Run: 366498 Event: 615981
local error not pos-def
Could there be a larger than usual pixel dead area in these events?
I am not aware of particularly large new dead regions, but I haven't yet checked in details. By the way I have seen plenty of these when running checks for the low pT electron issue, but I am wondering if this is a red herring. Do I understand correctly that these high memory jobs occurred in a replay with 13_0_4, while they didn't occur in real Prompt in 13_0_3? Should we not focus on what changed in between?
I ran the job on one thread. Here are the top-10 reported memory increases
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds 01-May-2023 19:12:42 CEST Run: 366498 Event: 210235
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds VSIZE 14910 0 RSS 6711.33 1700.12
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds 01-May-2023 15:54:33 CEST Run: 366498 Event: 1043611
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds VSIZE 12841.9 896 RSS 5974.74 1105.32
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 01-May-2023 15:37:08 CEST Run: 366498 Event: 277351
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 9113.46 768 RSS 4795.14 962.957
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 01-May-2023 15:59:42 CEST Run: 366498 Event: 219521
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 12842 0 RSS 4873.21 940.566
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 01-May-2023 15:42:02 CEST Run: 366498 Event: 268247
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 9129.86 0 RSS 4791.96 896.84
%MSG-w MemoryCheck: CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets 01-May-2023 19:08:34 CEST Run: 366498 Event: 210235
MemoryCheck: module CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets VSIZE 14910 0 RSS 4494.42 803.926
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 01-May-2023 15:54:18 CEST Run: 366498 Event: 1043611
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 11945.9 0 RSS 4863.84 763.105
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds 01-May-2023 16:16:21 CEST Run: 366498 Event: 649939
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds VSIZE 12842 0 RSS 5590.77 474.164
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 01-May-2023 19:12:21 CEST Run: 366498 Event: 210235
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 14910 0 RSS 4963.2 468.773
%MSG-w MemoryCheck: CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets 01-May-2023 15:52:07 CEST Run: 366498 Event: 1043611
MemoryCheck: module CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets VSIZE 11945.9 1536 RSS 4100.73 331.551
The last number of the printout is the RSS increase by the module.
FYI, we are observing this crash in the replay that is currently running for CMSSW_13_0_5. There are a few paused jobs and I hear from @germanfgv that they are crashing due to this issue.
Would it be feasible to have a 13_0_3 replay, or at minimum get the corresponding PSet, on these data? (the PSet from 13_0_4 does not work in 13_0_3)
One thread is certainly easier to interpret. I've been looking for modules with long run times and large RSS delta, and came up with some of the same suspects,
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds 27-Apr-2023 21:30:02 CEST Run: 366498 Event: 1181675
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:pixelPairElectronSeeds VSIZE 30694.3 0 RSS 13348.7 93.0898
%MSG-w MemoryCheck: SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds 27-Apr-2023 21:30:24 CEST Run: 366498 Event: 1181675
MemoryCheck: module SeedCreatorFromRegionConsecutiveHitsEDProducer:stripPairElectronSeeds VSIZE 31078.3 0 RSS 16184.9 697.008
Would it be feasible to have a 13_0_3 replay, or at minimum get the corresponding PSet, on these data? (the PSet from 13_0_4 does not work in 13_0_3)
Out of curiosity can someone try to run this job with era Run3
instead of Run3_2023
?
Out of curiosity can someone try to run this job with era Run3 instead of Run3_2023?
answering to myself, I tried to reproduce the configuration leading to the issue (in CMSSW_13_0_4) with
python3 Configuration/DataProcessing/test/RunPromptReco.py --scenario ppEra_Run3_2023 --reco --dqmio --dqmSeq=@common+@ecal+@egamma+@L1TEgamma --aod --global-tag 130X_dataRun3_Prompt_Candidate_2023_03_09_09_47_16 --lfn /store/backfill/1/data/Tier0_REPLAY_2023/EGamma1/RAW/v27184538/000/366/498/00000/694408a4-44b2-4a22-8fa5-7c68890bf99b.root --alcareco EcalUncalZElectron+EcalUncalWElectron+HcalCalIterativePhiSym+HcalCalIsoTrkProducerFilter+EcalESAlign --PhysicsSkims=@EGamma0
and compared that with what I obtain with the old (ppEra_Run3
setting):
python3 Configuration/DataProcessing/test/RunPromptReco.py --scenario ppEra_Run3 --reco --dqmio --dqmSeq=@common+@ecal+@egamma+@L1TEgamma --aod --global-tag 130X_dataRun3_Prompt_Candidate_2023_03_09_09_47_16 --lfn /store/backfill/1/data/Tier0_REPLAY_2023/EGamma1/RAW/v27184538/000/366/498/00000/694408a4-44b2-4a22-8fa5-7c68890bf99b.root --alcareco EcalUncalZElectron+EcalUncalWElectron+HcalCalIterativePhiSym+HcalCalIsoTrkProducerFilter+EcalESAlign --PhysicsSkims=@EGamma0
by running on 5 events of this run 366498 this is the RSS profile I get:
I'll try 13_0_5_patch1 with scenario ppEra_Run3
on some of the affected datasets.
Would it be feasible to have a 13_0_3 replay, or at minimum get the corresponding PSet, on these data? (the PSet from 13_0_4 does not work in 13_0_3)
running
python3 Configuration/DataProcessing/test/RunPromptReco.py --scenario ppEra_Run3 --reco --dqmio --dqmSeq=@common+@ecal+@egamma+@L1TEgamma --aod --global-tag 130X_dataRun3_Prompt_Candidate_2023_03_09_09_47_16 --lfn /store/backfill/1/data/Tier0_REPLAY_2023/EGamma1/RAW/v27184538/000/366/498/00000/694408a4-44b2-4a22-8fa5-7c68890bf99b.root --alcareco EcalUncalZElectron+EcalUncalWElectron+HcalCalIterativePhiSym+HcalCalIsoTrkProducerFilter+EcalESAlign --PhysicsSkims=@EGamma
in a 13_0_3, the picture is not dramatically different:
Are we running with concurrent lumis now compared to 12_4 for 2022? or did the change happen earlier. I can think of a downside being that heavy event crowding is now more likely (they take long time to process and also come with memory use peaks; lumi partitioning prevents more of them to processed at the same time) @makortel @Dr15Jones
I've been running for a couple of hours on lxplus8 using the tarball from the OP, with the SimpleMemoryChecker enabled. I'm more than 300 events in and I don't see the total memory much exceeding 14 GB, while the original log shows RSS exceeding GB using the same tool around event # 73. Has anyone been able to reproduce an RSS approaching the limit of 16 GB?
Are we running with concurrent lumis now compared to 12_4 for 2022? or did the change happen earlier.
Concurrent lumis were enabled already before 12_4.
The scenario donsn't seem to make any difference. Trying scenario ppEra_Run3
with CMSSW_13_0_5_patch1, we see the same high memory issues. You can find tarball for this latest occurrance of the problem here:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/job_25
You can see PerformanceMonitor tried to kill the job starting at 15:37:11
when it reached a PSS of
2023-05-02 15:37:11,149:INFO:PerformanceMonitor:PSS: 17238627; RSS: 17046588; PCPU: 767; PMEM: 8.6
2023-05-02 15:37:11,150:ERROR:PerformanceMonitor:Error in CMSSW step cmsRun1
Number of Cores: 8
Job has exceeded maxPSS: 16000 MB
Job has PSS: 17238 MB
The scenario donsn't seem to make any difference. Trying scenario
ppEra_Run3
with CMSSW_13_0_5_patch1, we see the same high memory issues. You can find tarball for this latest occurrance of the problem here:/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/job_25
You can see PerformanceMonitor tried to kill the job starting at
15:37:11
when it reached a PSS of2023-05-02 15:37:11,149:INFO:PerformanceMonitor:PSS: 17238627; RSS: 17046588; PCPU: 767; PMEM: 8.6 2023-05-02 15:37:11,150:ERROR:PerformanceMonitor:Error in CMSSW step cmsRun1 Number of Cores: 8 Job has exceeded maxPSS: 16000 MB Job has PSS: 17238 MB
is it possible to increase the memory just to know how much the job will need? (I'm not proposing to make it a default)
Actually, the job finished. It seems the SIGUSR2 signal that the wrapper uses to kill the job did not work, because the job log goes for several minutes after that and ends with exit code 0, as you can see in
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/job_25/job/WMTaskSpace/cmsRun1/cmsRun1-stdout.log
I'll try to increase the limit anyway, if anything, simply to check how the wrapper memory measurements compare to the internal measurements.
I have a question related to this. In the MemoryCheck messages I see the following:
%MSG-w MemoryCheck: JetAnalyzer:jetDQMAnalyzerAk4PFUncleaned 02-May-2023 15:37:08 CEST Run: 366498 Event: 61264126
MemoryCheck: module JetAnalyzer:jetDQMAnalyzerAk4PFUncleaned VSIZE 31993.2 0 RSS 16330 0.246094
Are those RSS values in kB or in kiB? I'm assuming/hoping they are in kB.
A SIGUSR2 code is caught by the framework and is used to stop the job early. With such a signal, the job will still exit with a value of 0 since it did shut down clearly. So the job 'finished' but probably didn't process all the events in the input.
Are those RSS values in kB or in kiB? I'm assuming/hoping they are in kB.
The RSS (and VSIZE) are in MiB.
A SIGUSR2 code is caught by the framework and is used to stop the job early. With such a signal, the job will still exit with a value of 0 since it did shut down clearly. So the job 'finished' but probably didn't process all the events in the input.
Ohh ok. Thanks for the clarification. I increased the limit to 20GB. I'll share the output when I get it.
I'm running a small replay with the current production configuration (CMSSW_13_0_3), only with dataset JetMET0, so we can compare it with this:
The scenario donsn't seem to make any difference. Trying scenario
ppEra_Run3
with CMSSW_13_0_5_patch1, we see the same high memory issues. You can find tarball for this latest occurrance of the problem here:/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/job_25
You can see PerformanceMonitor tried to kill the job starting at
15:37:11
when it reached a PSS of2023-05-02 15:37:11,149:INFO:PerformanceMonitor:PSS: 17238627; RSS: 17046588; PCPU: 767; PMEM: 8.6 2023-05-02 15:37:11,150:ERROR:PerformanceMonitor:Error in CMSSW step cmsRun1 Number of Cores: 8 Job has exceeded maxPSS: 16000 MB Job has PSS: 17238 MB
Has anyone been able to reproduce an RSS approaching the limit of 16 GB?
I ran a test of 73 events (that were processed in the original job) on slc7 machine, that reached 14.5 GB.
Longest running modules (> 5 sec) were
TimeReport 63.196841 63.196841 63.196841 lowPtTripletStepHitTriplets
TimeReport 57.752829 57.752829 57.752829 highPtTripletStepHitTriplets
TimeReport 49.104601 49.104601 49.104601 detachedQuadStepHitQuadruplets
TimeReport 35.766882 35.766882 35.766882 detachedTripletStepHitTriplets
TimeReport 27.754519 27.754519 27.754519 lowPtQuadStepHitQuadruplets
TimeReport 21.177794 21.177794 21.177794 initialStepHitQuadrupletsPreSplitting
TimeReport 21.139621 21.139621 21.139621 initialStepHitQuadruplets
TimeReport 15.396089 15.396089 15.396089 pixelPairElectronSeeds
TimeReport 7.218820 7.218820 7.218820 pixelPairStepTrackCandidates
TimeReport 6.141264 6.141264 6.141264 stripPairElectronSeeds
@makortel That seems consistent with what I found on lxplus8, but that's well below the max RSS in the original log file, and well within the limit that was set for the T0. Odd, no?
@makortel That seems consistent with what I found on lxplus8, but that's well below the max RSS in the original log file, and well within the limit that was set for the T0. Odd, no?
memory use peaks are stochastic in multithreaded, I doubt a single run would conclusively show.
So I skipped the job_25 forward 158 events and started processing there. On the 15th event I hit an RSS of 15253.8. It is probably that there is some hysteresis in the job (e.g. ROOT IO buffers) so that could be showing the issue. I was running CMSSW_13_0_4 on an el8 machine using 8 threads.
@Dr15Jones noticed that despite of pixelPairElectronHitDoublets
reporting "no pairs produced", the consuming module pixelPairElectronSeeds
sees a SeedingHitSet with substantial amount of elements. Further investigation revealed that the printout came from
https://github.com/cms-sw/cmssw/blob/85b455d63c5685b15564a5e0804565583e8b05ee/RecoTracker/TkHitPairs/src/HitPairGeneratorFromLayerPair.cc#L85-L88
and resulted in no pairs being produced for this specific layer pair, while the sum of pairs over all layer pairs in
https://github.com/cms-sw/cmssw/blob/85b455d63c5685b15564a5e0804565583e8b05ee/RecoTracker/TkHitPairs/plugins/HitPairEDProducer.cc#L108-L122
does not exceed the maxElementsTotal
.
I wonder if it makes sense for HitPairEDProducer
to produce hit pairs only for some layer pairs (or regions), or would it make more sense for the module to "abort" and produce empty products immediately when some layer pair (for some region) results in more than maxElements
hit pairs? @cms-sw/tracking-pog-l2 @cms-sw/egamma-pog-l2
would it make more sense for the module to "abort" and produce empty products immediately when some layer pair (for some region) results in more than maxElements hit pairs
this might lead to efficiency loss, no? does this change help with fixing the memory issue?
I'm running a small replay with the current production configuration (CMSSW_13_0_3), only with dataset JetMET0, so we can compare it with this:
Running the small replay with the same configuration as is in production, we got the memory issues again. This is particularly strange given that this run was processed with that configuration in production without any errors. For example, here you can find logs for one of the production production job:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/ProductionRun
This is runnig through the sames lumis as the job in:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_5_patch1/job_25
But the production job has a peak RSS of 8942.01 MB, while the 13_0_5_patch1 replay version was over 16000 when it was killed. 13_0_3 replay version of this job is still runnig.
I don't understand why this can be happening. I have asked ORM to pick a different run to test, as suggested by @mmusich yesterday.
would it make more sense for the module to "abort" and produce empty products immediately when some layer pair (for some region) results in more than maxElements hit pairs
this might lead to efficiency loss, no? does this change help with fixing the memory issue?
Regarding the efficiency loss, I suppose that depends on whether we are exceeding the limit in real collision events or in beam background events.
beam background events.
would be nice to confirm that 366498 has indeed beam background. Unfortunately it seems it's the offline DQM is not equipped to pick that up. I guess an offline analysis would be in order.
Some additional info:
In our profiling we have a ttbar single-thread wf (11834.21) with --era Run 3
and 2022 GT.
I see no significant increase in RSS for reco, mini or nano steps between 13_0_0 and 13_0_4.
It's not proof that the increase is not tied to the release, but it narrows the phase space.
EDIT: It think I misread @germanfgv comment... Do we see excessive memory in 13_0_3 replays?
would it make more sense for the module to "abort" and produce empty products immediately when some layer pair (for some region) results in more than maxElements hit pairs
this might lead to efficiency loss, no? does this change help with fixing the memory issue?
Very likely, but IIUC these limits exist to protect the data processing infrastructure from excessive resource usage. I'd be tempted to argue already losing all hit doublets from 5 layer pairs (like in 366498:1:210235) leads to such efficiency loss in a way that the event might not be useful for physics, and if that is the case, could we just avoid processing bigger parts of th event?
In a sense I'd say the 366498:1:210235 (probably along others) is close to being unprocessable with the current reconstruction. Adjusting the "maximum limit" behavior could be a quick way to work around the problem.
already losing all hit doublets from 5 layer pairs (like in 366498:1:210235) leads to such efficiency loss in a way that the event might not be useful for physics
okay I understand now. In that case, if it is decided to put this change in cmssw then from egamma side we will keep an eye on release validation (specially electron track related quantities) just to make sure that there is no or minimal effect.
I wonder if it makes sense for
HitPairEDProducer
to produce hit pairs only for some layer pairs (or regions), or would it make more sense for the module to "abort" and produce empty products immediately when some layer pair (for some region) results in more thanmaxElements
hit pairs?
I took the liberty of opening an RFC PR (exceptionally in 13_0_X branch directly) along this line in https://github.com/cms-sw/cmssw/pull/41514. In a quick test
Before proceeding further (with the PR to master
) I'd like to hear at least from @cms-sw/tracking-pog-l2 if this approach would be viable.
@makortel is early deletion working aggressively enough? Can it be that the module scheduling is spreading calls too thin so that the products that can be deleted actually stay in memory much longer
is early deletion working aggressively enough?
The data products in question are not on the early delete list. I've been trying to test already to see if adding them makes a difference. Results are not fully in but preliminary seems to indicate it is insufficient.
is early deletion working aggressively enough?
The data products in question are not on the early delete list. I've been trying to test already to see if adding them makes a difference. Results are not fully in but preliminary seems to indicate it is insufficient.
I guess I was looking in a wrong place, in MC setup we have
process.options.canDeleteEarly
include RegionsSeedingHitSets_pixelPairElectronHitDoublets__RECO
, similar for many more *Doublets
@slava77 It doesn't appear to be the *Doublets
which are causing the problem, but instead what reads the Doublets.
@slava77 It doesn't appear to be the
*Doublets
which are causing the problem, but instead what reads the Doublets.
Ah, I misunderstood then; because the proposed solution was to reduce the size of the *Doublets
.
Which modules reading Doublets
are a problem? Is it something specific (pixelPairElectronSeeds
) or in general
Ah, I misunderstood then; because the proposed solution was to reduce the size of the
*Doublets
.
Really to make the size of *Doublets
to 0, in which case the the code reading the *Doublets
should do only very little work.
We launched a new replay yesterday, using post scrubbing runs and CMSSW_13_0_5_patch1. The replay is almost over and we don't see any memory usage errors. Not only that but the jobs are finishing much faster. So now it feels like we just wasted a lot of people's time.
I still don't understand how this particular run 366498, successfully processed on production, all of the sudden cannot be reconstructed in replays with the same configuration. What about scrubbing changes the way we recontruct the data?
successfully processed on production,
this, I also don't understand :(
What about scrubbing changes the way we recontruct the data?
well, after scrubbing we'll have much less beam induced backgrounds, which - in turn - will lower the creation of large amounts of spurious tracking seeds (though this hasn't been confirmed yet, as far as I can tell, see https://github.com/cms-sw/cmssw/issues/41457#issuecomment-1532587959)
So I tried to add the data products made by the module's consuming the most data to the 'delete early' list:
process.options.canDeleteEarly.append('TrajectorySeeds_pixelPairElectronSeeds__RECO')
process.options.canDeleteEarly.append('TrajectorySeeds_stripPairElectronSeeds__RECO')
However, what ever minor improvement this made was dwarfed by the variability of memory usage seen from running the same multi-threaded job multiple times.
During a replay with the new CMSSW release 13_0_4, we observed a crash due to too much memory consumption in Prompt Reco for the datasets EGamma1, ParkingDoubleElectronLowMass, and JetMET0.
The tar ball regarding this crash can be found here: /afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay13_0_4/Memory/job_1467
For more details, please refer to this cmsTalk post.