gmaze commented 4 years ago

For standard users of the software, they should not need to clone the repository to be able to run it, but rather simply install it using pip or conda. This implies 2 things:

a proper setup.py and packaging
a serious refactoring of the code to manage options and configuration parameters outside of the software code (to be honest, I'd like this to be my 1st contribution to the code)

This is also to let you known that when this will be ready to publish euroargodev python softwares on pypi, I created a euroargodev account at https://test.pypi.org/user/euroargodev/ and https://www.pypi.org/user/euroargodev/ (don't be fulled by the picture, this is retrieved automatically from my email but will change soon, so that it is the EA logo).

edsmall-bodc commented 4 years ago

The packaging should be relatively straight forward.

The refactor may be slightly more tricky, though your current codes to fetch argo data will be a big help.

Currently the user has to define many things for the OWC toolbox to run:

For the update salinity mapping, they need to define:

Which data set they want to use from each WMO box (argo/CTD/bottle data) [though we have discussed how the WMO boxes could be phased out!]
The float name they want to analyse
The maximum number of historical casts they want to use during analysis
Whether or not they would like to use potential vorticity
Whether or not they would like to use special constraints around the subantarctic front
The latitude and longitude scales used for decorrelation (4 values)
The cross isobath scales (2 values)
The age scales (2 values)
The exclusion depth
The age exclusion parameter

Many of these parameters have a default value, but I know operators change them regularly

That brings variables for salinity mapping to 15, which is somewhat tricky to manage in a usable way. Either we will need to group some of these variables together (into classes or dictionaries?), OR we can split the update salinity mapping routine up into separate routines that can be called?

For calculate piecewise fit (which I have not pushed to master yet, as I'm still working on some bugs) users need the output from the salinity mapping routine, plus:

Number of breaks in the fitting line
(Optional) where to place these breaks
(Optional) the boundaries for potential temperature (two values)
(Optional) The boundaries for pressure (two values)

You can see an example of the set up here: https://github.com/ArgoDMQC/matlab_owc/blob/master/matlab_codes/set_calseries.m

Teddyzander commented 4 years ago

Could we write a class that has setters and getters for these values, and then we can just pass in the class to an update_salinity_mapping routine? Or just have the salinity mapping routine be a method in the class, of course.

Class could contain an object containing all of the argo_profiles fetched from the argo_fetcher routine, variables for the update salinity mapping route, and the update salinity mapping method..?

gmaze commented 4 years ago

Discussion moves toward another topic here, I suggest to continue here euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool#7

gmaze commented 4 years ago

Dear all, In order to package the software correctly, we need to make a decision with regard to third party libraries used by py_owc, especially for plotting functionalities, to be included. This means that when a new user will install py_owc, it will have to install other stuff as well. In same cases, users already have these (eg numpy), but in others, this may not be the case (eg geopandas). The lesser "requirements" we have, the easier it is to install and debug for specific user working environments.

Solution 1

py_owc does the mathematics of calibration, users handle visualization.

In order to run and compute the calibration, we don't need to plot anything. py_owc still provide plotting features, but does not make it a requirement for installation. So, third party libraries are limited to math stuff. pros:

light weight
fast and simple
near 100% success of install for the vast majority of users

cons:

users have to manage additional software installation to use all py_owc features.

Solution 2

py_owc does the mathematics of calibration and shows users the results for decision making.

We consider that plotting is a key feature or step of the analysis. Third party libraries for plotting are required for the software to run. pros:

complete experience for users, all included

cons:

more complex software management and sustainability
more issues during initial install step for users

Please provide comments or choose which solution you promote 1 vs 2

apswong commented 4 years ago

The plots are essential. I don't understand the difference between (1) and (2). One way or another the user will have to produce the plots. Does (1) make the plots optional?

gmaze commented 4 years ago

The plots are essential. I don't understand the difference between (1) and (2).

When we want to distribute a python software, we need to explicitly define a list of third party libraries, supposedly required for the software to run. Solution 1 does not include plotting 3rd party lib., Solution 2 does.

One way or another the user will have to produce the plots.

Solution 1 makes sure that users will be able to do the OWC maths, we don't care how user will to plots

Solution 2 makes sure that users will be able to do the OWC maths and use py_owc plotting functions (even if users won't use it in practice).

Does (1) make the plots optional?

No, in both cases, py_owc includes plotting functions.

This is like a radio in a car, it's an optional feature. We are the car manufacturer, we need decide if our own design of the radio will be optional or not. If the radio is "optional", the buyer can decide to have it or not. But in this later case, buyer can still install its own radio system. In any case, we manufacturer builds the car and design a radio to come with it.

We need to decide if plotting capabilities comes by default (solution 2) or if it's an optional feature (solution 1).

apswong commented 4 years ago

OK, I see - this is more of a software packaging dilemma. (1) is the more modular approach, which may be better for software management.

gmaze commented 4 years ago

While investigating #3 , I have discovered that pip does not use underscores in package names (they are discouraged in PEP8).

So, even if we work with py_owc internally, once published, users will have to install it with py-owc:

pip install py-owc

I dislike this idea and would prefer to work everywhere with the same name.

So, new proposition, what about a simple software named:

pyowc

I think we still do the link with matlab_owc, and yet a fully embrace the python way.

gmaze commented 4 years ago

ok, let's go everywhere (but not repo name) with pyowc

edsmall-bodc commented 4 years ago

@apswong @gmaze I understand the conflict here - the plotting section is not needed to create the mapped salinity, nor is it needed to calibrate the salinity of floats.

There are two crucial points to make though:

The plots are needed for our DMQC operators to make well informed decisions about whether or not to accept the calibration. Without the plots, no decision can be reached, and therefore the outputs are somewhat useless.
One of the things we have pushed for with this is to try and create a CONSISTENT and UNIFIED approach to the decision making process, and that includes documentation. Part of this plan is that plot outputs from DMQC operators should be consistent with each other in an attempt to ensure a more consistent analysis. Therefore, we should not be implying to the operators that it is up to them to how things are displayed. We also want our operators to share their possible improvements (via github issues/pull requests), and that includes the plotting routines. I worry that, if we make the plotting routine seem optional, when they make meaningful changes it won't be shared with the rest of the community.

I realise that this could make pyowc a little more clunky to install, and more challenging to maintain, but it seems to me that, if we want this consistency, then it really does need to be included.

Interested to hear what @matdon17 and @kamwal have to say here too.

gmaze commented 4 years ago

re-opening this issue because we didn't decided solution 1 vs 2

ps: yes @edsmall-bodc you're right, your point 2 is very good

matdon17 commented 4 years ago

What @edsmall-bodc has said reflects my position. I do see there being the potential for pyowc being a module in a wider ecosystem of Argo QC tools though, and could be seen as a module of that wider landscape. Although quite what the shape that might take is one for the future.

The only thing which has just occurred to me is whether we treat the trajectory plot alone as a module - as that is the one that seems to bring the most headaches with setup and dependencies (?). I am not convinced in either direction though.

edsmall-bodc commented 4 years ago

I like this idea as well - the trajectory plot is probably the least useful plot in terms of decision making (though, if I am wrong someone is free to correct me on this) and is probably the plot that causes the biggest headache. We could make the plot optional in installation, but provide some basic instructions/suggestions on how to plot trajectory.

There is also the fact that the trajectory plot is probably the plot most likely to be able to be used by other software that might have nothing to do with salinity calibration, so it would make sense to have a stand alone trajectory plotting routine that every argo tool could use..? Or we add it as a feature in argopy? If you are fetching argo data, it is likely you might want to plot the locations, perhaps?

gmaze commented 4 years ago

Plotting the trajectory is definitely a generic plot We also have one in argopy that produces a plot like this:

see here: https://argopy.readthedocs.io/en/latest/visualisation.html#trajectories-from-an-index

this is clearly a very simple plot and yet that needs a lot of tuning to suit all possible floats. I would keep it here as it is at this time. We'll need to talk later about common visualization tools

edsmall-bodc commented 4 years ago

Something I think we should consider adding at some point is something that fetches topology data from somewhere like GEBCO for the area of interest. This would mean the plot would contain very accurate and detailed bathymetry and coastlines, which is probably important in complex regions, near sandbars and islands etc.

The topology data that I am currently using isn't very precise purely because storing all that data locally would take up a huge amount of space.

gmaze commented 4 years ago

Yes, I also had this in argopy :https://github.com/euroargodev/argopy/blob/6273ddef6c22c1676aec56d9689612dbfb8ab902/argopy/utilities.py#L451 for the ETOPO bathy but this is not reliable

euroargodev / argodmqc_owc

Publishing on Pypi #23

Solution 1

Solution 2

pyowc