Closed dhruvbalwada closed 1 year ago
As a follow up, a second thing that is noticed is a difference in response between ftp gdac and local gdac was created using the rync instructions available in the documentation today morning).
Can be checked by seeing output of the following 4:
ftp_usgodae_gdac.float([29029]).load().to_xarray()
returns xarray dataset with no salinity because 29029 has no salinity sensor.ftp_usgodae_gdac.float([29030]).load().to_xarray()
returns xarray dataset output with salinity.ftp_usgodae_gdac.float([29029, 29030]).load().to_xarray()
' returns xarray dataset with no salinity, possibly because it is returning mutually common variables. local_gdac.float([29029, 29030]).load().to_xarray()
returns error ValueError: 'PROFILE_PSAL_QC' is not present in all datasets.
. Probably you can not reproduce this error without setting up your own local dac. It is interesting that there is a difference in output from last 2 calls, suggesting that the behavior of the data fetcher is different between local and remote ftp. Regardless, the returned xarray dataset from the ftp (when we don't get error) is not acceptable as we would prefer to get salinity from floats that the salinity data exists and get back no data from floats that have no salinity sensor (or get back a variable populated with missing values and appropriate QC flag).
Updating to the latest version seems to have solved this. Closing for now.
Hi @dhruvbalwada Indeed, depending on the data source, one or more chunks of data have to be merged eventually by argopy. By default, up to now, the internal argopy policy is to merge "down", i.e. to drop variables not available in all chunks. This is a simple way to limit not-always-necessary large dataset
Regardless, the returned xarray dataset from the ftp (when we don't get error) is not acceptable as we would prefer to get salinity from floats that the salinity data exists and get back no data from floats that have no salinity sensor
This is not the preference for everyone in all situations, especially when dealing with BGC variables.
that being said, with the last 0.14rc2 release, argopy has the ability to let users to modify this behaviour for the BGC dataset using the erddap data source. we plan on making this choice available to all data sources and data sets in the very near future.
I and @andrewfagerheim have been facing a problem as we have noticed that not all floats have the same core data variables. In particular, some floats do not have conductivity sensors. An example of this float is 29029, but there are many more. This becomes a total failure problem if you try to load floats over a region that has any of these floats (from any data sources apart from erddap).
This is a reopening of issue #228, since we have now had time to do a deeper dive into the problem. Originally @gmaze had told us that he was not able to reproduce our problem as he was using erddap, and now we understand partially why (as explained below). Obviously erddap is a great way to access small parts of data, but basically not very useful if you want to do a global analysis - as it runs into problems if the datasets get too large (e.g. #287). Some of these can be mitigated through the tricks explained here, but ideally it would be best to be able to use a local gdac for global or large data analysis as no other method can beat that speed of that option.
We would really appreciate it if some help can be provided on this matter from the devs, who understand Argo data access much better than us.
MCVE Code Sample
Expected Output
The expected out is that all the dsfloat* will show the same data. Which looks something like
Problem Description
However, the behavior is the following:
DataNotFound: 'Empty dataset, no data to transform !'
DataNotFound: "['https://argovis.colorado.edu/catalog/platforms/29029']"
. However I was not able to load any data with argovis (including the inability to run the example for argovis here).Versions
Argopy version 0.1.12
More details.
which returns a metadata item saying that
Webb Research, no conductivity
under the columnprofiler
. For contrast if you look at 29030, the same metadata item saysWebb Research, Seabird sensor
.