Closed stellema closed 19 hours ago
Alternatively, we could modify independence.run_tests
to return this dataset?
Btw, null_correlation_bounds
has the wrong dimensions in that example. It's an easy fix - I just need to specify dim='k'
in the quantiles at the end of independence._get_null_correlation_bounds
.
That looks like a nice solution!
I think we do want to modify independence.run_tests
to return the dataset.
So in _main()
in independence.py
I think we want:
ds_independence = run_tests(
da_fcst,
init_dim=args.init_dim,
lead_dim=args.lead_dim,
ensemble_dim=args.ensemble_dim,
)
and then instead of a call to create_plot
we want _main()
to end like similarity.py
does with whatever code is needed to write the dataset to file:
infile_log = {args.fcst_file: ds_fcst.attrs["history"]}
ds_independence.attrs["history"] = fileio.get_new_log(infile_logs=infile_logs)
if args.output_chunks:
ds_independence = ds_independence.chunk(args.output_chunks)
if "zarr" in args.outfile:
fileio.to_zarr(ds_independence, args.outfile)
else:
ds_independence.to_netcdf(args.outfile)
That means people can either use the independence.py
command line program to get a netCDF (or Zarr) file with the independence test numbers in it, or they can open a notebook and call independence.run_tests()
to get an xarray Dataset with the test numbers in it.
How does that sound?
Yup, that sounds good. What do you think would be the best default cmd line behaviour in terms of plotting vs saving the dataset (or both):
--save_dataset
or --data_output
argument (args.outfile
refers to the plot file name)args.outfile
works for the plot or data file and change the extension as needed (e.g., '.png' to '.nc' and vice versa)?Hmm. Not 100% sure what the best approach would be.
Maybe it's easiest if it just produces a data file in all cases, and if someone wants to plot the data they can read in that data file, import independence
and then use independence.create_plot()
?
Actually, we probably need an independence.point_plot()
function (i.e. the existing create_plot()
function) and a new independence.spatial_plot()
function that plots a spatial map showing the first lead time that passes the independence test.
The independence test results outputs (
mean_correlations
andnull_correlation_bounds
) are usually saved as a scatter plot, but it would be useful to store these results (e.g., to mask dependent lead times at each grid point). Here is an example of how we could convert the dictionaries to a single dataset:output: