NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
119 stars 19 forks source link

Configurably truncated representation when FieldList is very long #772

Open sadielbartholomew opened 3 months ago

sadielbartholomew commented 3 months ago

In str and repr calls on a FieldList, we print a one-line view of all fields present in the list, which is generally useful and sensible, however when the FieldList is suitably long this can seem like far too much information that it becomes spammy and it becomes hard to access previous terminal or interactive Python calls and output. For example, working with WRF data I get >10,000 field representations spat out as a listing.

A nice feature would be to truncate this listing with an ellipsis (or otherwise, but that seems standard and sensible e.g. as used to truncate numpy arrays), when the list is over N items long for some sensible N:

[<CF Field: ncvar%ACGRDFLX(ncdim%Time(1), ncdim%south_north(179), ncdim%west_east(139)) J m-2>,
 <CF Field: ncvar%ACGRDFLX(ncdim%Time(1), ncdim%south_north(179), ncdim%west_east(139)) J m-2>,
 <CF Field: ncvar%ACGRDFLX(ncdim%Time(1), ncdim%south_north(179), ncdim%west_east(139)) J m-2>,
...
 <CF Field: ncvar%ZS(ncdim%Time(1), ncdim%soil_layers_stag(4)) m>,
 <CF Field: ncvar%ZS(ncdim%Time(1), ncdim%soil_layers_stag(4)) m>,
 <CF Field: ncvar%ZS(ncdim%Time(1), ncdim%soil_layers_stag(4)) m>]

and perhaps to and also add a note of the full length to the start of the reporesentation to indicate how many fields have been subsumed into the ellipsis, e.g. to include a first line to the above such as this, or similar, where N is the length:

CF FieldList (N):
[ ... ]

Configurability

NumPy and others support configurability on the truncation threshold (point at which the representation gets summarised and not shown fully) via methods such as numpy.set_printoptions with a threshold argument, so ideally we can also allow the user to configure this for our FieldList (and maybe otherwise, with other truncation on aspects such as data array views?). I suggest adding another setting under cf.configuration() called print_threshold or similar, taking an integer as per the numpy threshold parameter.