PoisotLab / SimpleSDMLayers.jl

Simple layers for species distribution modeling and bioclimatic data
https://docs.ecojulia.org/SimpleSDMLayers.jl/stable/
MIT License
19 stars 2 forks source link

DataFrames cannot be written to CSV #52

Closed tpoisot closed 3 years ago

tpoisot commented 3 years ago

CSV.write refuses to write nothing to a file - I think it would be acceptable to remove all rows with a nothing in values, right?

gabrieldansereau commented 3 years ago

Are you talking about the DataFrame() overload?

It's really just returning all the values from the layer grid in a DataFrame, so the nothing values are the same as in the grid, yes. You can remove them with filter if you want , then export with CSV.write.

temperature = worldclim(1)
temperature_df = DataFrame(temperature)
filter!(x -> !isnothing(x.values), temperature_df)
CSV.write("test1.csv", temperature_df)
gabrieldansereau commented 3 years ago

Do you mean we should instead modify the DataFrame() overload so it doesn't return the nothing values?

I like the behaviour as it is. To me it's more intuitive like this, with the overload returning a DataFrame with the values for all grid cells, which we can then filter or not. It's similar to the raster package in R.

tpoisot commented 3 years ago

I agree with the general idea, the only point of friction I can see is that missing values in DataFrames should be missing, not nothing. That being said, you have used the package more than me so if the behavior makes sense to you, let's keep it. The ascii read/write methods in #54 are also going to offer another way to export data.

gabrieldansereau commented 3 years ago

Reopening this.

After working with the DataFrames overload for a while, I agree it would be simpler to use missing, not nothing. missing has better support in the DataFrames functions, and I find converting from nothing to missing unintuitive and a bit of a pain (see below). Especially to remove missing values.

Since #101 & v0.7.0 already bring a breaking release, I'll change this at the same time so that DataFrame(layer) returns missing for values which are nothing in the layer.

using SimpleSDMLayers
using DataFrames

layer = SimpleSDMPredictor(WorldClim, BioClim, 1)

df = DataFrame([layer, layer])
allowmissing!(df)
for col in [:x1, :x2]
    replace!(df[!, col], nothing => missing)
end
dropmissing(df, [:x1, :x2])