Open malininae opened 1 year ago
A small update, I decided to be creative, and shortened my data preprocessor to:
preproc_txx: &prep_txx
custom_order: True
regrid:
target_grid: 0.5x0.5
lon_offset: 0.25
lat_offset: 0.25
scheme: linear
extract_shape:
shapefile: british_columbia.shp
method: contains
crop: True
area_statistics:
operator: mean
rolling_window_statistics:
coordinate: time
operator: mean
window_length: 3
annual_statistics:
operator: max
and using the data over the 20-50 year chunks I was indeed interested in, instead 'overflown' data for 100+ years in order to subtract anomaly. To subtract anomaly I just added an extra group anomaly
, which basically is a mean of the reference period, and I simply subtract that mean in the diagnostic. That approach worked like a charm. I processed the data within 3 hours with the same computers and same requested resources. For a different recipe, I used the not working preprocessor, but only on 73 year data, and it works just fine. The problem seems to be the length of the data row, namely, somewhere between 80 and 120 years data becomes too much to handle.
I was thinking, if one subtracts anomalies with the period far away from the area of interest, may be one could re-evaluate the anomalies
preprocessor, so one doesn't need to process 120 years when one needs anyway only 40 years (20 in the beginning and 20 in the end of the whole period) out of it? Because otherwise one chokes poor computers for nothing.
@malininae one quick question before I find more time to look deeper into this - why don't you extract_shape
before regridding? That would shrink the data on the xy plane and the finely-gridded regridding will have an easier time
why don't you extract_shape before regridding? That would shrink the data on the xy plane and the finely-gridded regridding will have an easier time
Unfortunately, that wouldn't help because the target grid constructed by the regrid
preprocessor always covers the globe completely because it has no way of knowing which bit of it you want.
why don't you extract_shape before regridding? That would shrink the data on the xy plane and the finely-gridded regridding will have an easier time
Unfortunately, that wouldn't help because the target grid constructed by the
regrid
preprocessor always covers the globe completely because it has no way of knowing which bit of it you want.
Plus, since I am doing extremes the edges are super important for me.
Following the discussion during the November 2023 monthly meeting, here's the monster recipe which I was talking about. I'm analyzing high-resolution HighResMIP sub-daily data. I have an access to the monster computer with up to 445Gb memory per processes, however I have only 3 hours per process. What I ended up doing, I ran each of the variable groups in a separate recipe, since it takes a 2:26 hours and astonishing 248Gb to run a variable group. You can see an example of a log file for one of the groups here. I tried combining the outputs using the --resume_from
option but since the recipes were slightly different, I got an error. I ended up just manually modifying my settings.yml
.
To answer some questions upfront, the wind derivation is done lazily, I double checked when I created the function. The only option I see in easing the wind derivation, create separate variable uas_sq = uas**2
and vas_sq=vas**2
. Here is the link for the pull request with the branch with the wind derivation function.
Thanks for sharing, I'll have a look! Could you also upload the shapefile here so I can run the recipe?
Thanks for sharing, I'll have a look! Could you also upload the shapefile here so I can run the recipe?
Oops, sorry! I couldn't attach it here, I put it into my google drive and set the permissions so everyone with the link can view it, let me know if you can't access it.
What I ended up doing, I ran each of the variable groups in a separate recipe, since it takes a 2:26 hours and astonishing 248Gb to run a variable group. You can see an example of a log file for one of the groups here.
I would recommend using the Dask Distributed scheduler to run this recipe and set max_parallel_tasks: 1
in config-user.yml. That will give you more control over how much memory Dask uses and will likely also run faster. Could you give that a try and let me know how it goes?
A bit of background to that advice: I ran the variable group wind_fut
from the attached log file on Levante and the job completed in about 40 minutes. Here are the settings that I used:
SLURM job script
#!/bin/bash -l
#SBATCH --partition=interactive
#SBATCH --time=08:00:00
#SBATCH --mem=32G
set -eo pipefail
unset PYTHONPATH
. /work/bd0854/b381141/mambaforge/etc/profile.d/conda.sh
conda activate esmvaltool
esmvaltool run recipe_extremes_wind_3h.yml
and the following ~/.esmvaltool/dask.yml
file:
cluster:
type: dask_jobqueue.SLURMCluster
queue: compute
cores: 128
memory: 256GiB
processes: 32
interface: ib0
local_directory: /scratch/b/b381141/dask-tmp
n_workers: 32
walltime: '8:00:00'
Considering that the input data is about 700 GB, that means we are processing about 300 MB / second. I profiled the run with py-spy and it looks like half of the time is spent on data crunching and the other half is spent on loading the iris cubes from file and some other things. Thus, using more workers in the Dask cluster may make it faster, provided that the disk you are loading the data from can go faster. There is probably some room to improve the non-data crunching parts too, but that will require changes in iris. The profiling result is available here, you can load the file into speedcope.app if you're interested. The 'save' function is where the computation on the Dask cluster happens.
Hi all,
In the July 2022 Meeting, I raised an issue, that I can't process the data with a lot of preprocessors (see https://github.com/ESMValGroup/Community/discussions/33#discussioncomment-3121419). When I tried it back there, it all seemed to work just fine, however, I finally ran into a problem.
I was running this recipe (it looks ugly, since it's still being developed, so please no judgement), and it works OK for the groups 'all', 'obs_abs' and 'obs_ano', but it can't process 'nat' and 'ssp245'. I thought, OK, I split the recipe and process separately 'nat' and 'ssp245' in their own recipes and recombine them for the diagnostic, but no, I seem to not be able to process them either. The computers I am working on allow only 6h jobs, but there's not much limitations for the number of the checked out processors. For this job, I checked out 27 cpus and allocated 180Gb of memory, I don't allow more than one process per cpu. (I also tried 20, and it didn't work.) My hunch is, that the more fine resolution models are not processed and just keep hanging.
A small note, I think the issue here is my preprocessor
rolling_window_statistics
, because, I think, theiris
function it uses realizes data. I processed everything quite well withoutrolling_window_statistics
on 20 cpus.I'm not sure what's the best way of handling that here. For now, I will process the data as I did it before, but someone might be interested in it. main_log_debug_tx3x.txt
Here are the computers specifications if that matters: