NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
4 stars 0 forks source link

DEBUSSY package to extend existing data format to include (space and) time metadata #278

Closed eva0marques closed 8 months ago

eva0marques commented 8 months ago

Steps:

eva0marques commented 8 months ago

Should we create new R object classes? Instead of using functions, we could implement this library in an Object-Oriented manner:

kyle-messier commented 8 months ago

@eva0marques Yes, I'll clone to github once we get the Enterprise acct set up.

I don't know enough about creating new R object classes to know if that would be a good approach to take. I'll try to read up on that - let me know if you have any good resources, or if you want to articulate potential pros/ cons. Thanks!

eva0marques commented 8 months ago

I feel the same, so I am currently reading courses online about it.

eva0marques commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology So now I understand how object-oriented R programming works, I think it is very inspiring. So we can definitely create a new class inherited from pre-existing objects, and create associated methods. It will solve the problem we talked about with @mitchellmanware: we can create a new class st_data_table with additional functionalities for space and time dimensions, but if the user wants to use it for basic data (resp. spatial-only data, temporal-only data), it will just use basics methods already available for data.table (resp. spatial data.table, temporal data.table). Or we can create different classes for each data type, if we want to make it stricter, it will work too.

I am motivated to do that, and I think it is not very complex.

sigmafelix commented 8 months ago

I'm interested in transitioning to object-oriented programming practice in R.

I found some references for spatiotemporal data class transformation:

kyle-messier commented 8 months ago

@eva0marques Sounds like a good idea to me! And thanks @sigmafelix for interest too. I have one question or concern. See this section in the targets package bookdown documentation. The first sentence is that using targets (pipelines) requires a function-oriented style. I think in a sense, we are doing functional-oriented by making smaller packages. We should be able to use tar_targets for the new object-oriented classes. What do y'all think?

mitchellmanware commented 8 months ago

I do not know much about this, so @eva0marques can you share the online materials you read.

sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology I think the functions implemented in debussy are mostly primitive, thus we will need to encapsulate many of them into a high-level function to use it in our targets pipeline.

eva0marques commented 8 months ago

Thank you all for your feedbacks. Here is one example of tutorial on object-oriented programming: https://adv-r.hadley.nz/s3.html (I had some difficulties to understand the beginning so I switched to a another course, but it is in French https://stt4230.rbind.io/programmation/oop_r/). @sigmafelix I will take a look at the packages you shared. @Spatiotemporal-Exposures-and-Toxicology I think we can still keep our function-oriented style but calling on object-oriented programming when we know that we will reuse a specific data format many times, just to define methods (like print, plot, etc) adapted to these objects. It will just give us more efficiency.

kyle-messier commented 8 months ago

@eva0marques @sigmafelix @mitchellmanware I agree with all the points here about OOP and functional style - we can maintain a high-level functioning for targets while developing OOP objects for special ST data. @eva0marques The bookdown you referenced goes to S3 - I believe the recommendation is to utilize S4 if you do object-oriented programming in R - S3 is easier to use, but has fewer beneficial properties of OOP. 🚀 🚀 🚀

eva0marques commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology From what I've read, s3 is much more widely used and I've read that s4 should only be used if necessary. Even if s3 is very permissive, there are codes of good practice to follow to ensure proper use of the object (they are explained in the tutorial, we need to write a constructor, validator and helper each time we create a new class)

kyle-messier commented 8 months ago

@eva0marques Ok, that's good to know. It sounds like we need to figure out whether the additional functionality and formal methods of S4 are worth the effort over the simpler S3. Some quick searching also makes it sound like S3 can be computationally faster than S4, which is obviously a good thing for our case.

eva0marques commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology Yes and I've just realized that terra has a lot of s4 methods, so I will try to learn more about that.

eva0marques commented 8 months ago

For those interested: I think I'm gonna investigate the extension of data.table (s3- class inheritance). Additional ref: https://www.r-bloggers.com/2023/04/extending-data-frames/

eva0marques commented 8 months ago

spacetime library: similarities with our project. No conversions to terra space-time data classes.

The data class space-time data.table actually already exists: sftime.

eva0marques commented 8 months ago

Reconsideration of DEBUSSY objectives: I think DEBUSSY is actually not relevant because everything has already been done in sf, sftime, terra, stars, cubble packages. Package spacetime also has interesting ST-data classes but it relies on deprecated sp library (does it mean we are not suppose to use it? I found no information on it).

Space time classes:

kyle-messier commented 8 months ago

@eva0marques Do the above packages and their respective classes have methods for conversion? I think that will dictate whether we need a package or supplemental functions

eva0marques commented 8 months ago

Yes that was also my concern, but I think so. Eventually it could be useful to have a list of SpatVector / SpatRaster properly indexed by time (and convert it to stars objects). Eventually not from terra to stars (and reversely).

kyle-messier commented 8 months ago

We have decided to utilize available packages in R to deal with data.table and ST data. Creating a new task related to changing beethoven functionality that was previously using sdst functions