ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Synda time slice in selection file breaks whenever a file has a bad name #201

Open AtefBN opened 2 years ago

AtefBN commented 2 years ago

Scenario: selection file with specific time slice. When synda executes the sdtimefilter check for time range validation it relies on a simple truncate of the filename string and then immediately applies a split('-') on the string. From sdtimefilter.py `def timeslice_in_allowed_time_range(file_timeslice,allowed_time_ranges): for allowed_time_range in allowed_time_ranges: (start,stop)=split_timeslice(file_timeslice) (allowed_range_start,allowed_range_stop)=split_timeslice(allowed_time_range)

    if (allowed_range_start<=start) and (allowed_range_stop>=stop):
        return True
    else:
        continue

return False

def split_timeslice(timeslice): (start,end)=timeslice.split('-')

# For now, supported granularity for timeslice is YYYYMM (year and month).
# So we remove here finer grained timestamp info if any (i.e. day,hour,etc..).
# Finer grained timestamp info exists for example in 3hr file (i.e. frequency=3hr).
#
start=start[0:6]
end=end[0:6]

return (start,end)`

When a file is published with a bad naming, the truncate doesn't produce a string in the proper format causing synda to break down. In this instance the filename had "gn3RaXbM42915" instead of a time range. ` Error occured at 2022-06-22 15:26:26.001725

Traceback (most recent call last): File "/home/synda/miniconda3/envs/synda35/bin/synda", line 33, in sys.exit(load_entry_point('synda==3.35', 'console_scripts', 'synda')()) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/main.py", line 196, in run status = sdtiaction.actionsargs.subcommand File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdtiaction.py", line 421, in install status, newly_installed_files_count = sdinstall.run(args) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdinstall.py", line 46, in run metadata = syndautils.file_full_search(args) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/syndautils.py", line 185, in file_full_search metadata = sdsearch.run(stream=stream, dry_run=args.dry_run) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdsearch.py", line 82, in run metadata=_get_files(squeries,parallel,post_pipeline_mode,action,playback,record) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdsearch.py", line 132, in _get_files metadata=execute_queries(squeries,parallel,post_pipeline_mode,action) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdsearch.py", line 97, in execute_queries metadata=sdpipeline.post_pipeline(metadata,post_pipeline_mode) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdpipeline.py", line 99, in post_pipeline metadata=sdpipelineprocessing.run_pipeline(metadata,po) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdpipelineprocessing.py", line 63, in run_pipeline chunk = f(chunk, *args, **kwargs) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdpipeline.py", line 44, in main_pipeline files=sdfilepipeline.run(files=files) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdfilepipeline.py", line 49, in run files=sdtimefilter.run(files) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdtimefilter.py", line 34, in run if timeslice_in_allowed_time_range(file_timeslice,allowed_time_ranges): File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdtimefilter.py", line 43, in timeslice_in_allowed_time_range (start,stop)=split_timeslice(file_timeslice) File "/home/synda/miniconda3/envs/synda35/lib/python3.8/site-packages/synda/sdt/sdtimefilter.py", line 55, in split_timeslice`

This was replicated on synda 3.35 and synda 3.4 Proposed` fix is to try and except and return a false error in case of split_timeslice error effectively skipping the said file.