Closed rykelly closed 9 years ago
My thought: not only do the files take up space, but writing them consumes both time and resources (bandwidth). Where possible, modelers and model2netcdf.* authors can consider configuring model runs to only write what will be needed and even adding an option to writing directly to netcdf.
Is there any role or plans for Brown Dog to play a role in the model2netcdf conversions?
I agree that everything would be more efficient if models just wrote out their output in the PEcAn standard, but there's no way to enforce that and realistically most teams won't do it (case in point, despite using ED2 and SIPNET for years, we've never rewritten their outputs to be in netCDF).
What I told Ryan in discussing this with him is that the simplest way to do this was to just build deleting the original model output into the job.sh script. That said, I think that behavior should be up to the user who implements each model package, not a required behavior. For a model like SIPNET, where the output is simple and similar to the netCDF in content, there's really no information loss in deleting the original, but that's not true for ED2 where there's a ton of site, patch, and cohort level information that's lost if you delete the hdf5 files and only retain the PEcAn netCDF.
Right now there are no plans to use Brown Dog in model2netcdf since it's not a required function anymore. That said, there's nothing keeping individual modeling teams from using Brown Dog if they want to, though it would just exacerbate the bandwidth problem.
Ankur R Desai, Associate Professor University of Wisconsin - Madison, Atmospheric and Oceanic Sciences http://flux.aos.wisc.edu http://flux.aos.wisc.edu/ desai@aos.wisc.edu mailto:desai@aos.wisc.edu O: +1-608-520-0305 / M: +1-608-218-4208
On Jul 1, 2015, at 4:41 PM, David LeBauer notifications@github.com wrote:
My thought: not only do the files take up space, but writing them consumes both time and resources (bandwidth). Where possible, modelers and model2netcdf.* authors can consider configuring model runs to only write what will be needed and even adding an option to writing directly to netcdf.
Is there any role or plans for Brown Dog to play a role in the model2netcdf conversions?
— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/536#issuecomment-117832524.
I am going to implement this for SIPNET, for now by modifying model2netcdf.SIPNET()
. Will default to keeping output, but have an option for deleting sipnet.out
after conversion to netcdf is done.
To do this, I'm going to add an argument remove.raw.outputs
to run.write.configs()
, which will get copied into the job.sh
as an argument to model2netcdf.SIPNET()
. So in turn, I'm going to have to add the same argument to all models' write.config.*()
functions, just so they don't throw an error when receiving it. At the moment it won't do anything for those other models though.
I just want to check that this sounds OK to everyone before moving forward. This seems convoluted to me, but I don't see a simpler solution.
can you add it to the model section of pecan.xml, so we can pass that as an argument to write.configs. Maybe call it
Will this have an associated tag in settings, similar to database$bety$write?
Or maybe there should be something more general, to flag runs for archiving vs testing?
Yeah, was thinking to make this a setting in the .xml—thanks, @robkooper for the reminder to modify the template.
@dlebauer I also like the idea of a more general flag for testing, if there are multiple settings that make sense for testing vs. production.
I agree with passing this through settings, not a new arguement.
Not sure that this choice is the same as testing vs archive -- we're still archiving the model runs here, just not the raw output. I think an arguement could be made for making this the default for sipnet, especially if we could go through the sipnet.out to make sure no output variables are dropped in conversion.
On second thought, @robkooper is there any reason to assign a default of FALSE, rather than just
if(!is.null(settings$model$delete.raw) && settings$model$delete.raw) {
...
}
?
Just so it is clear when somebody reads the pecan.xml file and not have to dig through the code to try and find out what the default is.
OK, makes sense. So just have read.settings()
assign the default value of FALSE?
yup, just put it somewhere where the model is being parsed and checked.
Per an offline discussion with @mdietze and @ankurdesai , it would be good to have the space-saving option to delete the raw model outputs after they've been converted to netcdf. For example, if running thousands of iterations of data assimilation. Seems like each model2netcdf.MODEL function would need to take care of this in its own way, either by actually deleting output files when finished, or by returning a list of those files so that a generic external function could do it.
I can work on this but thought I'd solicit feedback first.