PredictiveEcology / reproducible

A set of tools for R that enhance reproducibility beyond package management
https://reproducible.predictiveecology.org/
41 stars 14 forks source link

Unzipping large raster corrupts file #335

Open CeresBarros opened 1 year ago

CeresBarros commented 1 year ago

I've been hitting an issue with loading a very large raster and have recently traced it down to an issue with unzipping.

Require("PredictiveEcology/reproducible@f5b0cf1059534b4dcaa40f1cc238fae992112e8b (HEAD)")
mainDir <- tempdir()
options("reproducible.cacheSaveFormat" = "qs",
        "reproducible.useNewDigestAlgorithm" = 2,
        "reproducible.useCache" = TRUE,
        "reproducible.destinationPath" = normPath(file.path(mainDir, "inputs")),
        "reproducible.inputPaths" = normPath(file.path(mainDir, "data")),
        "reproducible.useGDAL" = FALSE,
        "reproducible.useMemoise" = TRUE,
        "reproducible.useTerra" = TRUE,
        "reproducible.rasterRead" = "terra::rast")

rawBiomassMap <- Cache(prepInputs,
                       url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",
                       targetFile = "CA_forest_total_biomass_2015.tif",
                       archive = "CA_forest_total_biomass_2015_NN.zip",
                       datatype = "INT2U",
                       filename2 = .suffix("rawBiomassMap.tif", "test"),
                       overwrite = TRUE,
                       userTags = c("rawBiomassMap"))

here's the output of a similar call, in which I was passing the to = studyArea and method = "bilinear" arguments (I'm pretty sure the lack of those won't make a difference)

Running preProcess
Preparing: CA_forest_total_biomass_2015.tif
Checking local files...
Finished checking local files.
Checking local files...
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
Appending checksums to CHECKSUMS.txt. If you see this messagePrepInputs repeatedly,
  you can specify targetFile (and optionally alsoExtract) so it knows
  what to look for.
  |======================================================================================================================================| 100%
...downloading...  Downloading https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip ...
Checking local files...
Files found in CHECKSUMS.txt that match by basename; using these.
  User should specify all files (e.g., targetFile, alsoExtract, archive)
  with subfolders specified.
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping extractFromArchive: all needed files now present
Appending checksums to CHECKSUMS.txt. If you see this messagePrepInputs repeatedly,
  you can specify targetFile (and optionally alsoExtract) so it knows
  what to look for.
...using copy in getOption('reproducible.inputPaths')...
Copy of file: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015_NN.zip
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.dat.tif.aux.xml
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.dat.tif.xml
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tfw
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif.ovr, was created at: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015_NN.zip
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.dat.tif.aux.xml
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.dat.tif.xml
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tfw
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif.ovr
targetFile located at F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
Loading object into R
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE, purge = 7): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE, purge = 7): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 
> rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:

When I try a direct rast call on the unzipped .tif I get the same error:

## same error:
rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1)                        
CeresBarros commented 1 year ago

So I manually unzipped the file (on reproducible.inputPaths) and then tried to rerun the call. Right after manually unzipping, I tried to directly load the raster using rast. That worked:

> rast("F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif")
class       : SpatRaster 
dimensions  : 156966, 193936, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -2660911, 3157169, -851351.9, 3857628  (xmin, xmax, ymin, ymax)
coord. ref. : Lambert_Conformal_Conic_2SP 
source      : CA_forest_total_biomass_2015.tif 
name        : CA_forest_total_biomass_2015 

I then tried to run the same Cache(prepInputs(...)) call and got the same error, but this time there was no unzipping involved (so maybe the problem is happening on file copying/linking between reproducible.inputPaths and reproducible.destinationPath?):

Running preProcess
Preparing: CA_forest_total_biomass_2015.tif
Checking local files...
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping download. All requested files already present
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping extractFromArchive attempt: no files missing
... copying to getOption('reproducible.inputPaths')...
Copy of file: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif, was created at: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
targetFile located at F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
Loading object into R
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 

After the above, I tried to re-load the raster in reproducible.inputPaths again, using rast which failed -- the same happends with the copy in reproducible.destinationPath. This makes me think that somehow the two copies get screwed up by prepInputs/preProcess?

## reproducible.inputPaths copy
rast("F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 

## reproducible.destinationPath copy
rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 
CeresBarros commented 1 year ago

any news on this front @eliotmcintire ?