Issue in running the tracking algorithm for CM1 output

ealucy commented 9 months ago

My goal is to track vorticity features within tropical cyclone model output in CM1. Currently, when running the code, I'm getting an error that the algorithm doesn't seem to be seeing the files within the directory that I have housed them:

(aug23_env) el381212@turing:/nfs/tcdynasty/lucy$ python run_generic_tracking.py config.yml 2023-12-05 20:01:29,557 - pyflextrkr.idfeature_driver - INFO - Identifying features from raw data 2023-12-05 20:01:30,181 - pyflextrkr.idfeature_driver - INFO - Total number of files to process: 0 2023-12-05 20:01:30,184 - pyflextrkr.idfeature_driver - INFO - Done with features from raw data. 2023-12-05 20:01:30,184 - pyflextrkr.tracksingle_driver - INFO - Tracking sequential pairs of idfeature files 2023-12-05 20:01:30,185 - pyflextrkr.tracksingle_driver - INFO - Total number of files to process: 0 2023-12-05 20:01:30,185 - pyflextrkr.tracksingle_driver - INFO - Done with tracking sequential pairs of idfeature files 2023-12-05 20:01:30,185 - pyflextrkr.gettracks - INFO - Tracking features sequentially from single track files 2023-12-05 20:01:30,186 - pyflextrkr.gettracks - INFO - Total number of files to process: 0 Traceback (most recent call last): File "/nfs/tcdynasty/lucy/run_generic_tracking.py", line 61, in tracknumbers_filename = gettracknumbers(config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/nfs/knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/pyflextrkr/gettracks.py", line 74, in gettracknumbers logger.debug(f"files[0]: {files[0]}")


IndexError: list index out of range

I'll also attach my config file:

---
# ERA5 vorticity anomaly tracking configuration file

# Identify features to track
run_idfeature: True
# Track single consecutive feature files
run_tracksingle: True
# Run tracking for all files
run_gettracks: True
# Calculate feature statistics
run_trackstats: True
# Link merge/split tracks
run_mergesplit: True
# Map tracking to pixel files
run_mapfeature: True

# Start/end date and time
startdate: '20000101_000004'
enddate: '20000101_000008'

# Parallel processing set up
# run_parallel: 1 (local cluster), 2 (Dask MPI)
run_parallel: 1
nprocesses: 32  # Number of processors to use if run_parallel=1

databasename: 'cm1out_'
#databasename: ERA5_SFvortPV_
# Specify date/time string format in the file name
# E.g., radar_20181101.011503.nc --> yyyymodd.hhmmss
# E.g., wrfout_2018-11-01_01:15:00 --> yyyy-mo-dd_hh:mm:ss
time_format: 'yyyymodd_hhmmss'

# Input files directory
clouddata_path: '/nfs/tcdynasty/lucy/cm1/'

# Working directory for the tracking data
root_path: '/nfs/tcdynasty/lucy/cm1_tracking/'
# root_path: '/pscratch/sd/j/jmarquis/ERA5_waccem/Bandpassed/'
# Working sub-directory names
tracking_path_name: 'vtracking'
stats_path_name: 'vortstats'
pixel_path_name: 'vortracking'

# Specify types of feature being tracked
# This adds additional feature-specific statistics to be computed
feature_type: 'generic'

# Specify data structure
datatimeresolution: 1/3600     # hours
pixel_radius: .015625      # km
x_dimname: 'ni'
y_dimname: 'nj'
time_dimname: 'time'
time_coordname: 'time'
x_coordname: 'x'
y_coordname: 'y'
field_varname: 'rel_vort'

# Feature detection parameters
label_method: 'skimage.watershed'
# peak_local_max params:
plm_min_distance: 15   # min_distance - distance buffer between maxima; num grid points
plm_exclude_border: 5   # exclude_border - distance buffer between maxima and the domain sides; num grid points
plm_threshold_abs: 0   # threshold_abs - minimum magnitude of PSI' required to define a maxima
# watershed params:
cont_thresh: 0.00002   # PSI' contour defining outermost of flood-filled object area
compa: 0    #"compactness factor" - (how much you'll let a flood fill spread into a neighbor's domain. Zero or < 100 seemed ok.)

# field_thresh: [1.6, 1000]  # variable thresholds
min_size: .1   # Min area to define a feature (km^2)
R_earth: 6378.0  # Earth radius (km)

# Tracking parameters
timegap: 1/3600         # hour
othresh: 0.3           # overlap percentage threshold
maxnclouds: 100       # Maximum number of features in one snapshot
nmaxlinks: 10          # Maximum number of overlaps that any single feature can be linked to
duration_range: [6, 800]   # A vector [minlength,maxlength] to specify the duration range for the tracks
# Flag to remove short-lived tracks [< min(duration_range)] that are not mergers/splits with other tracks
# 0:keep all tracks; 1:remove short tracks
remove_shorttracks: 1
# Set this flag to 1 to write a dense (2D) trackstats netCDF file
# Note that for datasets with lots of tracks, the memory consumption could be very large
trackstats_dense_netcdf: 1
# Minimum time difference threshold to match track stats with cloudid pixel files
match_pixel_dt_thresh: 60.0  # seconds

# Link merge/split parameters to main tracks
maintrack_area_thresh: .1  # [km^2] Main track area threshold
maintrack_lifetime_thresh: 60/3600  # [hour] Main track duration threshold
split_duration: 30/3600  # [hour] Split tracks <= this length is linked to the main tracks
merge_duration: 30/3600  # [hour] Merge tracks <= this length is linked to the main tracks

# Define tracked feature variable names
feature_varname: 'feature_number'
nfeature_varname: 'nfeatures'
featuresize_varname: 'npix_feature'

# Track statistics output file dimension names
tracks_dimname: 'tracks'
times_dimname: 'times'
fillval: -9999
# Output file base names
finalstats_filebase: 'trackstats_final_'
pixeltracking_filebase: 'vort_tracks_'

# List of variable names to pass from input to tracking output data
pass_varname:
  - 'rel_vort'

All the files are housed in the /nfs/tcdynasty/lucy/cm1/ directory, but it seems to me that they're not being found by the code. Any assistance is much appreciated!

feng045 commented 9 months ago

Based on your config, the code would be searching for input files like this: /nfs/tcdynasty/lucy/cm1/cm1out_yyyymodd_hhmmss.nc

And the files date/time must be within this range: startdate: '20000101_000004' enddate: '20000101_000008'

You should check to make sure that matches your input files.

ealucy commented 9 months ago

Yes, they match. The files are titled like this: 'cm1out_20000101_000004.nc'. Curious!

feng045 commented 9 months ago

I just realized that your startdate and enddate only differ by 4 seconds. The code calculates the date/times from your filenames (hence the specified datetime format 'yyyymodd_hhmmss' in the config), and then only keeps those that fall within your specified startdate and enddate for processing.

What does your file names look like? Can you put the list of your full file names here?

ealucy commented 9 months ago

Yes, that is correct. This is the file list: cm1out_20000101_000004.nc
cm1out_20000101_000005.nc
cm1out_20000101_000006.nc cm1out_20000101_000007.nc cm1out_20000101_000008.nc There are only these five files, as the entire dataset is not housed locally. I was hoping to test the tracker on these few to get an idea of how it works before attempting to do so on the entire dataset.

feng045 commented 9 months ago

I think I may know why. The function in PyFLEXTRKR converting input file datetimes did not use the digits down to seconds precision. See the code at this line.

You can try making a larger datetime window that include all the files you have, e.g., startdate: '20000101_000000' enddate: '20000101_001000'

FlexTRKR / PyFLEXTRKR

Issue in running the tracking algorithm for CM1 output #75