check_input_data created 0-size file then aborted with a traceback

billsacks commented 4 years ago

With cime5.8.17: I added some files to the svn inputdata repository from cheyenne, then ran a test suite on izumi. The first test to reach the check_input_data phase failed, leaving behind a 0-size file. TestStatus.log shows the following:

Checking that inputdata is available as part of case submission
Setting resource.RLIMIT_STACK to -1 from (-1, -1)
Loading input file list: 'Buildconf/cpl.input_data_list'
Loading input file list: 'Buildconf/clm.input_data_list'
Loading input file list: 'Buildconf/cism.input_data_list'
Loading input file list: 'Buildconf/rtm.input_data_list'
Loading input file list: 'Buildconf/datm.input_data_list'
Using protocol wget with user anonymous and passwd user@example.edu
Trying to download file: '../inputdata_checksum.dat' to path '/scratch/cluster/sacks/tests_0404-113405iz/ERP_Ld5.f19_g17.I2000Clm50SpRtmFl.izumi_gnu.clm-default.GC.0404-113405iz_gnu/run/inputdata_checksum.dat.raw' using WGET protocol.
SUCCESS

Using protocol ftp with user anonymous and passwd user@example.edu
server address ftp.cgd.ucar.edu root path cesm/inputdata
Trying to download file: '../inputdata_checksum.dat' to path '/scratch/cluster/sacks/tests_0404-113405iz/ERP_Ld5.f19_g17.I2000Clm50SpRtmFl.izumi_gnu.clm-default.GC.0404-113405iz_gnu/run/inputdata_checksum.dat.raw' using FTP protocol.
Using protocol svn with user  and passwd
Checking server ftp://gridanon.cgd.ucar.edu:2811/cesm/inputdata/ with protocol gftp
Setting resource.RLIMIT_STACK to -1 from (-1, -1)
Checking server ftp://ftp.cgd.ucar.edu/cesm/inputdata with protocol wget
Setting resource.RLIMIT_STACK to -1 from (-1, -1)
Using protocol wget with user anonymous and passwd user@example.edu
Loading input file list: 'Buildconf/cpl.input_data_list'
Loading input file list: 'Buildconf/clm.input_data_list'
Trying to download file: 'lnd/clm2/paramdata/clm5_params.c200402.nc' to path '/fs/cgd/csm/inputdata/lnd/clm2/paramdata/clm5_params.c200402.nc' using WGET protocol.
env_batch.xml appears to have changed, regenerating batch scripts
manual edits to these file will be lost!

  Model clm missing file paramfile = '/fs/cgd/csm/inputdata/lnd/clm2/paramdata/clm5_params.c200402.nc'
Client protocol gftp not enabled
Client protocol None not enabled
Client protocol gftp not enabled
  Model clm missing file paramfile = '/fs/cgd/csm/inputdata/lnd/clm2/paramdata/clm5_params.c200402.nc'
Traceback (most recent call last):
  File "./case.submit", line 126, in <module>
    _main_func(__doc__)
  File "./case.submit", line 123, in _main_func
    mail_user=mail_user, mail_type=mail_type, batch_args=batch_args, workflow=workflow)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 205, in submit
    custom_success_msg_functor=verbatim_success_msg)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1739, in run_and_log_case_status
    rv = func()
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 203, in <lambda>
    batch_args=batch_args, workflow=workflow)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 133, in _submit
    case.check_case()
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 219, in check_case
    self.check_all_input_data()
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 170, in check_all_input_data
    success = _downloadfromserver(self, input_data_root, data_list_dir)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 192, in _downloadfromserver
    user=user, passwd=passwd, ic_filepath=ic_filepath)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 366, in check_input_data
    isdirectory=isdirectory, ic_filepath=ic_filepath)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 146, in _download_if_in_repo
    success = server.getfile(rel_path, full_path)
  File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/Servers/wget.py", line 53, in getfile
    logging.warning("wget failed with output: {} and errput {}\n".format(output.encode('utf-8'), errput.encode('utf-8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 119: ordinal not in range(128)

That was on Saturday. When I manually removed the file and then ran ./check_input_data --download manually this morning, it downloaded the file successfully.

I'm not sure if this is a bug in the scripts or if it was due to a temporary server glitch - or both - but the traceback suggests to me that maybe the error handling code is not quite right? I'm up and running now so this isn't critical for me, but I'm reporting this in case it indicates a problem that should be fixed.

jedwards4b commented 4 years ago

@jgfouca I'm always confused by these encoding issues - what's the proper fix for this?

File "/home/sacks/ctsm_code/ctsm/cime/scripts/Tools/../../scripts/lib/CIME/Servers/wget.py", line 53, in getfile logging.warning("wget failed with output: {} and errput {}\n".format(output.encode('utf-8'), errput.encode('utf-8'))) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 119: ordinal not in range(128)

jgfouca commented 4 years ago

Fix coming, we seem to still have a lot of encodes/decodes floating around CIME that are not at all needed.

jgfouca commented 4 years ago

OK, I pushed a fix directly to master (it was very minor change). I made changes many months ago that ensured that users would never have to encode stuff coming back from run_cmd(...), regardless of their python version. I thought I had grepped-through the code and cleaned-up all the places that were doing these unnecessary encodings, but I either missing the ones in wget.py or they got reintroduced somehow.

ESMCI / cime

check_input_data created 0-size file then aborted with a traceback #3478