Closed aasdelat closed 2 weeks ago
It seems as if your dimensions and the values of your array are flipped. Could you copy paste the output of printing the YAXArray in the REPL here?
Could you provide an example file that I could look at?
Ok. The file is too large, so I send it by we transfer: https://we.tl/t-AC7DBQp7vE
Here is the output of my datset, "ds":
julia> ds
YAXArray Dataset
Shared Axes:
↓ values Sampled{Int64} 1:338880 ForwardOrdered Regular Points
Variables:
latitude,
msl
↓ Ti Sampled{DateTime} [1989-01-01T00:00:00, …, 1989-01-31T23:00:00] ForwardOrdered Irregular Points
longitude,
Properties: Dict{String, Any}("edition" => "1", "source" => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1989-01-01_1989-01-31_msl.grib", "centreDescription" => "European Centre for Medium-Range Weather Forecasts", "centre" => "ecmf", "subCentre" => "0", "Conventions" => "CF-1.7")
As a side note, I have noted that it does not print the name of the last variable ("longitude") in a new line, but at the end of the previous line (I have introduced a carriage return in the text after pasting it here). Perhaps it is an issue for DimensionalData.
I had a look at the data and it seems that you have converted the dataset from GRIB to netcdf? How did you do that and might the problem be related to this conversion?
Have you tried opening the original GRIB data with Rasters and the GRIBDatasets package? If that works you could then convert the Raster to a YAXArray to continue working with YAXArrays.
Yes, they are converted from Grib. I can open the gribs with GRIBDatasets with no problem. In fact, I converted the grib files to netcdf by opeing the gribs in julia with NCDatasets and saving them with NCDatasets. The structure of the data is correct, perhaps it is not the most common one, but it is correct, and YAXArrays reads and interprets it correctly. The only thing, is that I need a means for selecting a variable (msl) by the values of another variable (latitude and/or longitude) with which it shares one dimension (values). I tested Raster, but it does not correctly interpret the data structure.
I have to correct myself: Raters correctly interprets the structure. I tested it a long time ago and it seem not to do it, but now it does!. So, I will give it a try, and convert the raster to YAXArrays, if necessary.
I have the answer to my question. In order to select the variable msl depending on the values of latitude (which is not a dimension, but another variable that shares the dimension "values" with the variable msl), I have to do (supposing I want to select data at longitudes below 40º):
ds.msl[values=Where(v->ds.longitude[v] < 40)]
But it seems to last forever ... but it finally ends with the error:
julia> ds.msl[values=Where(v->ds.longitude[v] < 40)]
ERROR: ArgumentError: Unable to determine chunksize of non-range views.
Stacktrace:
[1] eachchunk_view(::DiskArrays.Chunked{…}, vv::SubArray{…})
@ DiskArrays ~/.julia/packages/DiskArrays/MpOpv/src/subarray.jl:29
[2] eachchunk
@ ~/.julia/packages/DiskArrays/MpOpv/src/subarray.jl:25 [inlined]
[3] YAXArray
@ ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:136 [inlined]
[4] rebuild(A::YAXArray{…}, data::DiskArrays.SubDiskArray{…}, dims::Tuple{…}, refdims::Tuple{}, name::DimensionalData.NoName, metadata::Dict{…})
@ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:200
[5] rebuild
@ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:85 [inlined]
[6] rebuildsliced
@ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:100 [inlined]
[7] rebuildsliced
@ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:99 [inlined]
[8] view
@ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:125 [inlined]
[9] _dim_view
@ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:110 [inlined]
[10] #view#110
@ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:81 [inlined]
[11] getindex(::YAXArray{Float64, 2, YAXArrayBase.NetCDFVariable{…}, Tuple{…}, Dict{…}}; kwargs::@Kwargs{values::Where{…}})
@ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:487
[12] top-level scope
@ REPL[18]:1
Some type information was truncated. Use `show(err)` to see complete types.
However, if I read the data with Rasters, it returns the result instantly and correctly.
I guess that you opened the data in Rasters with lazy=false which is the default.
And with YAXArrays we hit a very inefficient access pattern which we should fix in YAXArrays. Thanks for the report.
You could use YAXArrays.readcubedata(arr)
to first load the whole data into memory to have a faster access pattern.
Wow! It runs now instantaneously. But, due to my particular data structure, I have to read the variables msl and longitude in different variables:
ds_m = readcubedata(ds.msl)
ds_l = readcubedata(ds.longitude)
# And the selection:
ds_m[values=Where(v->ds_l[v] < 40)]
because, if not:
julia> readcubedata(ds)
ERROR: MethodError: no method matching ndims(::Dataset)
Closest candidates are:
ndims(::Type{Union{}}, Any...)
@ Base abstractarray.jl:276
ndims(::Type{<:Ref})
@ Base refpointer.jl:96
ndims(::Type{<:Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{N}, Nothing}}) where N<:Integer
@ Base broadcast.jl:278
...
Stacktrace:
[1] dimnames(x::Dataset)
@ YAXArrayBase ~/.julia/packages/YAXArrayBase/R6Frw/src/axisarrays/axisinterface.jl:46
[2] caxes(x::Dataset)
@ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:209
[3] readcubedata(x::Dataset)
@ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:230
[4] top-level scope
@ REPL[26]:1
readcubedata
was designed to be used on single arrays you have a dataset so it does not fit well but I think we should be able to make it work on Datasets as well.
But, is there a way to load a dataset into memory, other then readcubedata
, or only readcubedata
can load into memory?
I will summarize the answer here: The selection of one variable based on the values of another variable with which it shares a dimension, can be done via:
# Read data structure
ds = open_dataset(input_file_name)
# The structure of my data is:
julia> ds
YAXArray Dataset
Shared Axes:
↓ values Sampled{Int64} 1:338880 ForwardOrdered Regular Points
Variables:
latitude,
msl
↓ Ti Sampled{DateTime} [1989-01-01T00:00:00, …, 1989-01-31T23:00:00] ForwardOrdered Irregular Points
longitude,
Properties: Dict{String, Any}("edition" => "1", "source" => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1989-01-01_1989-01-31_msl.grib", "centreDescription" => "European Centre for Medium-Range Weather Forecasts", "centre" => "ecmf", "subCentre" => "0", "Conventions" => "CF-1.7")
# Read data into memory
ds_m = readcubedata(ds.msl)
ds_l = readcubedata(ds.longitude)
# And the selection:
ds_m[values=Where(v->ds_l[v] < 40)]
Where msl
is the main variable in which values I am interested, and longitude
is the variable with which I have to make the selection.
It is necessary to read both variables into memory by means of readcubedata
. If they are loaded just by open_dataset
, you will get an error and this will not work.
On the other hand, Cube
cannot be used in place of open_dataset
in this case, because of the structure of my data (see the difference between a dataset and a cube).
Thanks for the write up. Do you want to open a pull request to add this info to the docs? This would fit well into the how to section.
Ok, but I am learning github right now. So I am going to take a while...
That is no problem, feel free to ask questions if you feel stuck in the process of opening a PR.
Ok, I am doing it.
Pull request done! I have proposed a large rearrange of the help, so I think it has to be studied.
Hi, @felixcremer , I have opened a pull request. Could you tell me if I have to do anything else or If I have made the pull request correctly?
what PR was it? should we close this?
The pull request is https://github.com/JuliaDataCubes/YAXArrays.jl/pull/414 and it has been already accepted and closed. So, yes, this issue can be closed too.
Hello: I have to select data within a geographical window given as longitudes east and west and latitudes north and south. But the spatial coordinate of my data is "values", and I have the variables "msl" (mean sea level pressure), "longitude" and "latitude". I think that the name of the spatial dimension, "values", is a bit confusing, because you can think that they are values of a meteorological variable, but they are not. "values" is a spatial dimension taking values from 1 to the total number of grid points. The dimensions of the msl variable are: "values" and "Ti". The longitude and latitude variables, both have only one dimension: "values". So, each datum in msl can be assigned a unique longitude and latitude via the common dimension: "values". So, how can accomplish this?