LSSTDESC / DESC_DC2_imSim_Workflow

BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Run 2.2i Production Logs #41

Open villarrealas opened 4 years ago

villarrealas commented 4 years ago

This is a log of Run 2.2i Production using Parsl.

y1-wfd (NERSC):

At this point we switched to Parsl production using the log-in node, which allows us to run multiple 1024 node blocks, as a workaround for the above mentioned slurm issue.

Switched back to 1024 node sequential jobs in order to finish small amounts, while above issues are addressed. Finalizing y1-wfd with the exception of the visits listed below, which will be fold into y2-wfd processing for efficiency purposes.

y2-wfd: (NERSC) Including the remaining work from y1-wfd, totals 17706 task submissions to complete.

villarrealas commented 4 years ago

Jim has identified that the previous memory crashes are consistent with low sky brightness images, which require large amounts of memory when reaching postage stamps. Suggested fixes are randomizing the order of objects as they are drawn, or developing new memory aware threading.

villarrealas commented 4 years ago

{'00008044', '00242078', '00231490', '00187589', '00202496', '00189282', '00211270', '00245346', '00212711', '00250936', '00184656', '00206040', '00000231', '00230247', '00254941', '00217622', '00188992', '00219909', '00207748', '00000311', '00260466', '00254314', '00204649', '00191377', '00183031', '00174602', '00213619', '00262006', '00237293', '00185672', '00202470', '00037609', '00159478', '00193050', '00235880', '00192348', '00002335', '00227731', '00254932', '00221578', '00169807', '00182971', '00225545', '00262543', '00208764', '00208722', '00214546', '00260435', '00191417', '00002208', '00193859', '00183768', '00204449', '00257716', '00194856', '00249495', '00204470', '00189313', '00057202', '00193881', '00032685', '00003664', '00235069', '00193088', '00237857', '00214395', '00254991', '00254960', '00252947', '00169906', '00159523', '00238035', '00233526', '00204661', '00242600', '00246253', '00227753', '00225469', '00205465', '00179998', '00195536', '00185656', '00216819', '00192968', '00227029', '00195587', '00161027', '00250882', '00048971', '00262561', '00174535', '00204707', '00231466', '00252987', '00257721', '00237190', '00250906', '00190339', '00219169', '00233531', '00202445', '00040282', '00206034', '00202498', '00174600', '00200907', '00211180', '00193087', '00211950', '00185639', '00248977', '00013319', '00202616', '00007372', '00216805', '00206345', '00254978', '00243761', '00169799', '00204555', '00187595', '00190240', '00252985', '00191426', '00237908', '00237180', '00213548', '00211474', '00169890', '00247302', '00214388', '00220002', '00216789', '00211974', '00252986', '00256314', '00260527', '00012484', '00060888', '00192912', '00230750', '00206085', '00243723', '00185681', '00195583', '00187559', '00254975', '00167885', '00206038', '00212843', '00199505', '00192915', '00005880', '00191139', '00211201', '00040322', '00012465', '00221421', '00227920', '00249450', '00254920', '00204475', '00242603', '00228099', '00001472', '00189285', '00208602', '00003662', '00193783', '00216815', '00257708', '00190490', '00227725', '00181972', '00192984', '00214469', '00238057', '00259016', '00193148', '00240846', '00254984', '00185663', '00053402', '00169894', '00211973', '00005875', '00242057', '00231465', '00233534', '00211356', '00202594', '00032652', '00261360', '00252984', '00235024', '00230635', '00216830', '00227918', '00262598', '00260448', '00053398', '00235030', '00057157', '00233519', '00262542', '00227724', '00262548', '00181900', '00233516', '00192895', '00209017', '00212029', '00004579', '00242072', '00243024', '00211534', '00229239', '00214442', '00227885', '00000256', '00254993', '00226987', '00254339', '00260446', '00202515', '00256375', '00016868', '00261993', '00032681', '00169754', '00262575', '00228070', '00208974', '00236480', '00240851', '00012450', '00003656', '00000264', '00219181', '00252971', '00068022', '00252937', '00213618', '00260486', '00250870', '00192896', '00233551', '00237289', '00242465', '00250866', '00219997', '00253003', '00260443', '00204397', '00249465', '00199649', '00221425', '00219932', '00177424', '00233599', '00193891', '00174547', '00000301', '00201144', '00013333', '00204406', '00227946', '00259075', '00204593', '00219958', '00250878', '00169849', '00244073', '00210472', '00204705', '00202456', '00213053', '00250875', '00040428', '00012476', '00253680', '00243780', '00211179', '00231496', '00214360', '00249535', '00252996', '00256381', '00233566', '00195607', '00256318', '00205458', '00159479', '00246646', '00233521', '00052544', '00187806', '00182114', '00230733', '00187503', '00193849', '00214470', '00002341', '00233569', '00185784', '00211956', '00202583', '00214553', '00192950', '00194907', '00260515', '00250896', '00252981', '00243764', '00225482', '00243732', '00002205', '00060902', '00204483', '00202483', '00202516', '00206030', '00209829', '00192983', '00216752', '00227798', '00211978', '00250497', '00212044', '00209081', '00190323', '00227881', '00202686', '00177482', '00250892', '00241996', '00195589', '00204399', '00033714', '00160357', '00203564', '00219978', '00225541', '00225494', '00212699', '00204597', '00195614', '00193048', '00214440', '00046278', '00252995', '00225502', '00000242', '00196568', '00005863', '00191339', '00194864', '00185645', '00260470', '00185758', '00225474', '00002346', '00204712', '00169891', '00202475', '00013331', '00179970', '00213661', '00229256', '00007992', '00262075', '00180002', '00238623', '00032678', '00057159', '00240814', '00231431', '00201728', '00195561', '00202478', '00211550', '00243028', '00206067', '00187602', '00225528', '00003651', '00250362', '00209011', '00238070', '00233518', '00197394', '00236483', '00040351', '00238627', '00242591', '00242604', '00185603', '00254931', '00229245', '00195563', '00202620', '00174582', '00235740', '00231419', '00182915', '00211980', '00180073', '00242051', '00200736', '00003114', '00258352', '00016912', '00259072', '00083949', '00185616', '00231497', '00250954', '00012483', '00233596', '00242506', '00180090', '00217724', '00210735', '00012458', '00209013', '00225467', '00211962', '00212226', '00249533', '00200751', '00227892', '00233580', '00227760', '00242008', '00250874', '00230831', '00002184', '00194107', '00183771', '00217579', '00214400', '00231506', '00213557', '00169816', '00235054', '00209820', '00183060', '00203610', '00064520', '00200705', '00241995', '00183812', '00212046', '00214414', '00249458', '00202495', '00212070', '00190307', '00225534', '00229234', '00185637', '00214338', '00174536', '00204387', '00227727', '00221576', '00229257', '00259073', '00007977', '00064487', '00192922', '00231442', '00185614', '00053411', '00012482', '00240853', '00204457', '00191381', '00227726', '00192940', '00262564', '00244441', '00214403', '00194908', '00185813', '00216768', '00195568', '00060891', '00225472', '00206028', '00190261', '00204390', '00249460', '00249483', '00212005', '00243029', '00013330', '00040349', '00216749', '00211966', '00193163', '00195615', '00201759', '00259076', '00243089', '00185819', '00254992', '00254918', '00242040', '00229252', '00007979', '00242598', '00051618', '00216788', '00233564', '00193038', '00233523', '00159489', '00183907', '00214439', '00260475', '00209858', '00196611', '00209082', '00179299', '00225544', '00193170', '00212844', '00187553', '00262538', '00185679', '00211999', '00250895', '00191167', '00185598', '00227715', '00212074', '00256319', '00242065', '00210527', '00258396', '00243736', '00193077', '00212710', '00252979', '00191280', '00243033', '00008002', '00193900', '00197356', '00060919', '00200824', '00256345', '00227709', '00202532', '00064498', '00256349', '00216829', '00204394', '00204473', '00167873', '00196438', '00008807', '00237260', '00227768', '00249464', '00204454', '00174587', '00249456', '00245200', '00213659', '00250904', '00202518', '00191179', '00207368', '00247444', '00169827', '00257763', '00243887', '00231488', '00239384', '00192308', '00229261', '00228872', '00257762', '00040357', '00212253', '00227772', '00204563', '00194846', '00231483', '00206055', '00173691', '00013345', '00200708', '00253025', '00189408', '00256959', '00185643', '00262585', '00221417', '00219185', '00236472', '00214317', '00243032', '00250868', '00211549', '00184888', '00209030', '00219233', '00243807', '00231424', '00247401', '00046322', '00259069', '00195542', '00233573', '00237295', '00233562', '00032684', '00252946', '00207799', '00262007', '00256995', '00179996', '00213528', '00233554', '00242006', '00016922', '00227923', '00000320', '00239379', '00206167', '00190186', '00227769', '00192932', '00197424', '00185770', '00208770', '00033725', '00250916', '00214516', '00040277', '00204388', '00233597', '00002345', '00183914', '00245345', '00230641', '00219977', '00216791', '00200152', '00192918', '00217618', '00219976', '00236490', '00235020', '00180134', '00064516', '00209007', '00190487', '00229285', '00185625', '00211992', '00212019', '00212076', '00202479', '00242054', '00252950', '00227791', '00244071', '00005763', '00229255', '00206070', '00206133', '00185668', '00225524'}

These are the remaining Y1 visits.

villarrealas commented 4 years ago

As a new post just for clarity: y2-wfd (NERSC): 26714933 - job submitted per NERSC suggestion w/ 2000 nodes for 16 hours. Ran to completion. 26740532 - 2000 node job for 16 hours, ran to completion.

villarrealas commented 4 years ago

Unfortunately dropped this a little over the break. Notes worth making: