ecmwf / hda

API to harmonised data access for DIAS/WEkEO
Apache License 2.0
8 stars 12 forks source link

`Checksum value could not be computed due to I/O read error` for ~80% of downloaded images #29

Open arkanoid87 opened 3 weeks ago

arkanoid87 commented 3 weeks ago

What happened?

for most of the images I'm downloading have invalid checksum according to gdalinfo -checksum

One of the many errors

ERROR 1: TIFFFillTile:Read error at row 4608, col 4608, tile 318; got 33374 bytes, expected 129895
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_EOSV.tif, band 1: IReadBlock failed at X offset 10, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 3: Checksum value could not be computed due to I/O read error.

What are the steps to reproduce the bug?

downloading these tifs

query = {
  "dataset_id": "EO:EEA:DAT:CLMS_HRVPP_VPP",
  "resolution": "10",
  "start": "2018-01-01T01:00:00.000Z",
  "end": "2018-01-01T01:00:00.000Z",
  "bbox": (8.111400390689214, 38.848683828777084, 9.85180313535546, 41.31972080422642),
  "itemsPerPage": 200,
  "startIndex": 0
}

After just 30 images:

$ for tif in *.tif; gdalinfo -checksum "$tif" > "$tif"_info.txt 2>&1; end
$ cat *.txt | grep "IReadBlock failed"
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_AMPL.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_EOSD.tif, band 1: IReadBlock failed at X offset 18, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_EOSV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_LENGTH.tif, band 1: IReadBlock failed at X offset 19, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_LSLOPE.tif, band 1: IReadBlock failed at X offset 19, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MAXD.tif, band 1: IReadBlock failed at X offset 17, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MAXV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MINV.tif, band 1: IReadBlock failed at X offset 16, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_RSLOPE.tif, band 1: IReadBlock failed at X offset 12, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SOSD.tif, band 1: IReadBlock failed at X offset 18, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SOSV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SPROD.tif, band 1: IReadBlock failed at X offset 14, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_TPROD.tif, band 1: IReadBlock failed at X offset 14, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_AMPL.tif, band 1: IReadBlock failed at X offset 10, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_EOSV.tif, band 1: IReadBlock failed at X offset 10, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_MAXV.tif, band 1: IReadBlock failed at X offset 11, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_MINV.tif, band 1: IReadBlock failed at X offset 12, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_RSLOPE.tif, band 1: IReadBlock failed at X offset 15, Y offset 13: TIFFReadEncodedTile() failed.

Version

hda==2.17

Platform (OS and architecture)

Ubuntu 22.04

Relevant log output

No response

Accompanying data

No response

Organisation

No response

arkanoid87 commented 3 weeks ago

images with errors after finishing downloading matches[:50]

cat *.txt | grep "IReadBlock failed"
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_AMPL.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_EOSD.tif, band 1: IReadBlock failed at X offset 18, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_EOSV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_LENGTH.tif, band 1: IReadBlock failed at X offset 19, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_LSLOPE.tif, band 1: IReadBlock failed at X offset 19, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MAXD.tif, band 1: IReadBlock failed at X offset 17, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MAXV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_MINV.tif, band 1: IReadBlock failed at X offset 16, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_RSLOPE.tif, band 1: IReadBlock failed at X offset 12, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SOSD.tif, band 1: IReadBlock failed at X offset 18, Y offset 15: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SOSV.tif, band 1: IReadBlock failed at X offset 17, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_SPROD.tif, band 1: IReadBlock failed at X offset 14, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s1_TPROD.tif, band 1: IReadBlock failed at X offset 14, Y offset 16: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_AMPL.tif, band 1: IReadBlock failed at X offset 10, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_EOSV.tif, band 1: IReadBlock failed at X offset 10, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_MAXV.tif, band 1: IReadBlock failed at X offset 11, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_MINV.tif, band 1: IReadBlock failed at X offset 12, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_RSLOPE.tif, band 1: IReadBlock failed at X offset 15, Y offset 13: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SMJ-010m_V101_s2_SOSV.tif, band 1: IReadBlock failed at X offset 13, Y offset 14: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_AMPL.tif, band 1: IReadBlock failed at X offset 6, Y offset 11: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_EOSV.tif, band 1: IReadBlock failed at X offset 8, Y offset 11: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_LENGTH.tif, band 1: IReadBlock failed at X offset 7, Y offset 10: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_MAXV.tif, band 1: IReadBlock failed at X offset 5, Y offset 11: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_MINV.tif, band 1: IReadBlock failed at X offset 1, Y offset 11: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_RSLOPE.tif, band 1: IReadBlock failed at X offset 8, Y offset 10: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_SOSV.tif, band 1: IReadBlock failed at X offset 6, Y offset 11: TIFFReadEncodedTile() failed.
ERROR 1: VPP_2018_S2_T32SNJ-010m_V101_s1_TPROD.tif, band 1: IReadBlock failed at X offset 6, Y offset 11: TIFFReadEncodedTile() failed.

I'm using GDAL 3.9.1

arkanoid87 commented 3 weeks ago

After downloading all 168 images in matches I get errors on 67 tifs, so I have to correct the percent of my title down to ~40%

arkanoid87 commented 3 weeks ago

if I re-download the same file a second time, it happens to not have the same error

I have to wrap the download() function into a "check checksum, delete if wrong, re-download based on filename" procedure

for i, row in results_gdf.iterrows():
  count_str = f"{i+1}/{len(results_gdf)}"
  search_result = row["search_result"]
  file_dir = CWD_PATH.parent / row["dataset"]
  file_path = file_dir / row["location"].split('/')[-1]
  if not file_dir.exists():
    file_dir.mkdir(parents=True)
  if not file_path.exists():
    print(f"{count_str} {file_path.name} does not exist and will be downloaded")
    search_result.download(download_dir=file_dir)
  else:
    attempts = 0
    attempts_max = 3
    while attempts < attempts_max:
      if is_checksum_valid(file_path):
        print(f"{count_str} {file_path.name} already exists: checksum is valid")
        break
      else:
        print(f"{count_str} {file_path.name} already exists: checksum is invalid and file will be downloaded (attempt {attempts+1}/{attempts_max})")
        file_path.unlink()
        search_result.download(download_dir=file_dir)
        attempts += 1