MagicForrest / DGVMTools

R package for processing, analysing and visualising ouput from Dynamic Global Vegetation Models (DGVMs)
GNU General Public License v3.0
27 stars 22 forks source link

Landuse file is not opening. #58

Closed halima1993 closed 4 years ago

halima1993 commented 4 years ago

Hey Matthew, I currently have a landuse file, and I try to open like this in R:

"

GUESS.run <- defineSource(id = "landuse",
                          dir = "E:/Landusehurrt", # this would normlly just be a character string containing a path
                          format = GUESS,
                          name = "landuse")

In response R gives this

Warning message:
In x@format@determinePFTs(x, ...) :
  Hmmm, not been able to identify the PFTs in LPJ-GUESS(-SPITFIRE) run landuse because I can't find an appropriate per-PFT file in the run directory. Returning the super-set list.
> r<-getField(source = GUESS.run, 
+          var = "landuse")
Error in lookupQuantity(var, source@format) : 
  Can't find a quantity with id = landuse anywhere in this context, or in Standard.quantities, or the supported.biomes.schemes, so failing.

Then I change the files name in my folder from landuse to cpool and run the same code but renamed it with cpool (note that the data is of landuse) do this:

getField(source = GUESS.run, 
         var = "cpool")

The R studio shows that it is at least running, but this is the error

Error in `[.data.table`(dt, , Lon) : 
  j (the 2nd argument inside [...]) is a single symbol but column name 'Lon' is not found. Perhaps you intended DT[, ..Lon]. This difference to data.frame is deliberate and explained in FAQ 1.1.

How can I open this file through DGVMTools, my other output files are opened with no issues. I am able to import the output file through R itself, in a dataframe format. But I would like to use the DGVMTools functions and open the file by the package. Hope I am clear enough. Let me know your response.

MagicForrest commented 4 years ago

This should be possible. But first, can you post the header of the Hurtt land use file? It needs to have exactly "Lon" and "Lat" defined. That is probably why it didn't open the file when you renamed to "cpool.out".

For the first error. Please try updating to the latest dev branch. It is more flexible about opening files which don't explicitly match predefined quantities.

halima1993 commented 4 years ago

Here is the header

lon lat year URBAN CROPLAND PASTURE FOREST NATURAL

For the update you mean the package "devtools"?

halima1993 commented 4 years ago

Yup I spotted the lon lat, it does not have Lon and Lat, any solution to that?

MagicForrest commented 4 years ago

You can use devtools to upgrade, but I mean install the latest development branch of DGVMTools with:

devtools::install_github("MagicForrest/DGVMTools", ref = "dev", dependencies = TRUE, build_opts = c("--no-resave-data", "--no-manual"), force=T)

For lot/lat, just manually edit the files to have "Lon" and "Lat". Although I think I should improve DGVMTools to make it more flexible in the this regard.

halima1993 commented 4 years ago

Thanks for the quick response. It might sound like a very "beginner question", but how can I manually edit it? The files are in output format. Do you mean after upgrading to latest dev branch, I am able to change the lat lon to Lat Lon?

MagicForrest commented 4 years ago

The .out files are just text files, you can open them in something like Wordpad or Notepad and just change them. This is independent of DGVMTools.

halima1993 commented 4 years ago

I cannot open the .out files in wordpad or notepad as they are 1 plus GB in size and covers the data globally.

So I imported the the output file by 'importing data' in R, and changed the column names. As .out is not a standard file suffix, i save it through

hurrt<-landuse85
names(hurrt)[names(hurrt) == 'lon'] <- 'Lon'
names(hurrt)[names(hurrt) == 'lat'] <- 'Lat'
write.csv(hurrt, file=gzfile("hurrt.csv.gz"))

However it still returns an error.

MagicForrest commented 4 years ago

Exactly what error? Also, you can save the file as "hurtt.out.gz". Then you need to supply 'var = "hurrt"' as the argument to getField().

halima1993 commented 4 years ago

My code:

I save the out file through write.csv(hurrt, file=gzfile("hurrt.out.gz"))

> landuseh<- defineSource(id = "landusehurrt",
+                           dir = "E:/hurrt", # this would normlly just be a character string containing a path
+                           format = GUESS,
+                           name = "landusehurrt")
Warning message:
In x@format@determinePFTs(x, ...) :
  Hmmm, not been able to identify the PFTs in LPJ-GUESS(-SPITFIRE) run landusehurrt because I can't find an appropriate per-PFT file in the run directory. Returning the super-set list.
> getField(source =landuseh,var = "hurrt")
Error in lookupQuantity(var, source@format) : 
  Can't find a quantity with id = hurrt anywhere in this context, or in Standard.quantities, or the supported.biomes.schemes, so failing.
MagicForrest commented 4 years ago

Did you install the dev branch of DGVMTools? That should solve your problem.

halima1993 commented 4 years ago

Yes I downloaded it. It did seem to run, however gave me warning messages

> landu<-getField(source =landuseh,var = "hurrt")
|--------------------------------------------------|
|==================================================|
Warning messages:
1: In min(all.years) : no non-missing arguments to min; returning Inf
2: In max(all.years) : no non-missing arguments to max; returning -Inf

When I print(landu) This is a snippet:

print(landu)

A Field object
Quantity:       hurrt (hurrt): Units=undefined unit, Defined for format: GUESS, CF name=hurrt
Id:
"landusehurrt.hurrt.Spatial_Full.Inf--Inf.Year.none"
Layers: 
V1 year URBAN CROPLAND PASTURE FOREST NATURAL
Dimensions: 
Lon Lat
Spatial Info:
Spatial aggregation = none
Spatial extent id = Full
Spatial extent: 
class      : Extent 
xmin       : -180.25 
xmax       : 179.75 
ymin       : -55.75 
ymax       : 83.25 
Year Info:
Yearly aggregation =  none
First year = Inf
Last year = -Inf
Subannual Info: 
Subannual original = Year
Subannual aggregation = none
Subannual resolution = Year
Data: 
V1    Lon Lat year URBAN CROPLAND PASTURE FOREST NATURAL

When I try to select years, this error pops up

yearssel1<-selectYears(x = landu, first = 1850, last = 2100)
Error in FUN(X[[i]], ...) : 
only defined on a data frame with all numeric variables
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1850 requested, but it is not in the data!
2: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1851 requested, but it is not in the data!
3: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1852 requested, but it is not in the data!
4: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1853 requested, but it is not in the data!
5: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1854 requested, but it is not in the data!
6: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1855 requested, but it is not in the data!
7: In selectYears(x = landu, first = 1850, last = 2100) :
  Year 1856 requested, but it is not in the data!
MagicForrest commented 4 years ago

Ah, you also need to change "year" to "Year", then you won't get all those errors/warnings. Right now DGVMTools is interpreting your "year" column as a data Layer and not the Year. Everything should be fine once you change that.

But I will make the package deal with this issue flexibly in the future. As for the other issues you raised, I will look into them tomorrow or next week....

halima1993 commented 4 years ago

I have changed year to Year, when I apply a spatial extent this errors arises:

> landu<-getField(source =landuseh,verbose=T,var = "hurrtlu", spatial.extent = mountains, spatial.extent.id = "SEA")
Seeking ModelField with id = landusehurrt.hurrtlu.Spatial_SEA.all_years
'read.full' argument set to TRUE, so reading full data file to create the field now.
Found and opening file E:/hurrt/hurrtlu.out
|--------------------------------------------------|
|==================================================|
Read table. It has header:
[1] "V1"       "Lon"      "Lat"      "Year"     "URBAN"    "CROPLAND" "PASTURE" 
[8] "FOREST"   "NATURAL" 
It has shape:
[1] 14856941        9
Correcting lons and lats with offset.
Offsets applied. Head of full .out file (after offsets):
   V1 Lon Lat Year URBAN CROPLAND PASTURE FOREST NATURAL
1:  1  87  46 1850     0        0   0     0   0
2:  2  87  46 1851     0        0   0      0   0
3:  3  87  46 1852     0        0   0      0   0
4:  4  87  46 1853     0        0   0    0   0
5:  5  87  46 1854     0        0   0    0   0
6:  6  87  46 1855     0        0   0     0   0
No year selection being applied
No year selection being applied
Error in getField(source = landuseh, verbose = T, var = "hurrtlu", spatial.extent = mountains,  : 
  getField() has produced an empty data.table, so subsequent code will undoubtedly fail.  Please check your input data and the years and spatial.extent that you have requested.
In addition: Warning messages:
1: In min(all.years) : no non-missing arguments to min; returning Inf
2: In max(all.years) : no non-missing arguments to max; returning -Inf

Any solution for that? (Note that i kept the all readings of crop, pasture, forest etc to zero by myself just to serve as an example for the code above)

halima1993 commented 4 years ago

I have just few questions I want to clarify Matthew. The land use file has 'undefined units', the previous files which I loaded like (cpool,cflux) have units defined in them like KgC m-2 yr-1. How can I be sure that which might be the units of this file in DGVMTools.

Secondly , if I want to calculate the region totals, should I do aggregateYears(x, method="mean") or aggregateYears(x, method="sum")?. In order to compute a single value of the regions total, I go for the PromotetoRaster function . Then should I go by the cellStats(raster,"mean")?

Thank you for you help!

MagicForrest commented 4 years ago

Hi Halima,

I don't know why selecting the mountains isn't working. We got this working before, right? Can you read the whole file, and then select the mountains gridcells with selectGridcells() afterwards.

For your next questions:

DGVMTools knows the units of some standard LPJ-GUESS output variables (like cflux and cpool) - these are defined in Quantity objects in the pcakage. However, "land use" is not a standard LPJ-GUESS output variables. I recently added functionality to the package to read files that don't correspond to standard LPJ-GUESS output variables, but it just uses defaults like "undefined units" because it doesn't have a Quantity object to define them. Buuuut... I also wrote a function to define new Quantity object and add them to a Source or a Format, so you can define units etc sensibly! Look at the the defineQuantity() function in the package.

As for regional totals, can you define what you want a bit more precisely? Do want the km^2 of each land cover type? Or something else? But you can use aggregateSpatial() to aggregate in space. You can also use addArea() to add a Layer of gridcell areas to the Field, and then you can multiply the land use fractions by this area (using layerOp()). Then you can use aggregateSpatail(..., method = "sum") so get the total areas covered by each land cover class.

halima1993 commented 4 years ago

Matthew, I think the errors ae coming due to the upgrade by : devtools::install_github("MagicForrest/DGVMTools", ref = "dev", dependencies = TRUE, build_opts = c("--no-resave-data", "--no-manual"), force=T)

The old codes that ran perfectly well before are also showing errors now:

Seeking ModelField with id = mpi_esm_lr_26_c3_1_1.mgpp.Spatial_SEA.2000-2010
'read.full' argument set to TRUE, so reading full data file to create the field now.
Found and opening file E:/mpi_esm_lr_26_c3_1_1/mgpp.out
|--------------------------------------------------|
|==================================================|
Read table. It has header:
 [1] "Lon"  "Lat"  "Year" "Jan"  "Feb"  "Mar"  "Apr"  "May"  "Jun"  "Jul"  "Aug"  "Sep"  "Oct" 
[14] "Nov"  "Dec" 
It has shape:
[1] 14856941       15
Correcting lons and lats with offset.
Offsets applied. Head of full .out file (after offsets):
    Lon Lat Year Jan Feb Mar Apr May Jun   Jul   Aug   Sep Oct Nov Dec
1: -180  66 2000   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
2: -180  66 2001   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
3: -180  66 2002   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
4: -180  66 2003   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
5: -180  66 2004   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
6: -180  66 2005   0   0   0   0   0   0 0.000 0.000 0.000   0   0   0
Selecting years from 2000 to 2010
Selecting years from 2000 to 2010
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables
In addition: There were 26 warnings (use warnings() to see them)
The warnings are 
> warnings()
Warning messages:
1: In min(all.years) : no non-missing arguments to min; returning Inf
2: In max(all.years) : no non-missing arguments to max; returning -Inf
3: In selectYears(dt, first = first.year, last = last.year) :
  Year 2000 requested, but it is not in the data!
4: In selectYears(dt, first = first.year, last = last.year) :
  Year 2001 requested, but it is not in the data!
5: In selectYears(dt, first = first.year, last = last.year) :
  Year 2002 requested, but it is not in the data!
6: In selectYears(dt, first = first.year, last = last.year) :
  Year 2003 requested, but it is not in the data!
7: In selectYears(dt, first = first.year, last = last.year) :
  Year 2004 requested, but it is not in the data!

This code ran well before and never produced an error.

MagicForrest commented 4 years ago

Oh dear! Sounds like I need to fix that! Can you please send me both the mgpp.out file and the mountains shapefile and I'l try to solve it...

halima1993 commented 4 years ago

I apologise, I am restricted to share the data, if you can kindly solve it without the .out file? Is there any way?

MagicForrest commented 4 years ago

Yeah, probably. Can you share the mountain file?

MagicForrest commented 4 years ago

Any chance you can just send the file? I need to register for that site and ideally I would rather not...

halima1993 commented 4 years ago

Greetings Matthew! Hope you are well. Wanted to know if the error problem is resolved. Kindly let me know. Thank you!

MagicForrest commented 4 years ago

I am working on it, some time today I think.

MagicForrest commented 4 years ago

Hi Halima. With the latest dev version I can't reproduce the error. Can you send me the full command/script that gave you the error?

halima1993 commented 4 years ago

Kindly check your email.

MagicForrest commented 4 years ago

Hi Halima. Add "cover.threshold = 0.01" to your call to getField(). Does that solve it?

I will think about how to handle this in the package in a nice way.

halima1993 commented 4 years ago

Hey Matthew, by adding the cover.threshold worked. Thanks a bunch!

In your previous comments you said: As for regional totals, can you define what you want a bit more precisely?

I want to find the the total VegC for my shapefile (not total for landcover classes, just for shapefile), shall I go with aggregate<-aggregateYears(x, method="sum") (in order to be more easily related to carbon budgets)

MagicForrest commented 4 years ago

Well, if you are aggregating over space the use aggregateSpatial() not aggregateYears()!

And yes, for the total VegC you want to add the gridcell area to the Field with adArea(), multiply the VegC later with the area using the layerOp() function and then sum over the gridcells using aggregateSpatial(..., method = "sum)

I am going to close this issue now because it is solved and has gone way off topic. Feel free to open a new issue on another topic!