Closed jr3cermak closed 6 months ago
CI updates in progress via #20.
Tests within docker are ok and matches independent testing.
==== 27 passed, 8 deselected, 1 xpassed, 3034 warnings in 267.18s (0:04:27) ====
@jr3cermak Can you rebase? CI changes are merged
Can we completely replace convertDBD.sh
(and the slocum binaries) with functionality from dbdreader
at this point?
It would be great to skip the ascii
conversion stage and promote important bits into the 2nd stage (netcdf) code. We would then have access to direct data frames instead having to read them from intermediate files. I am not sure dbdreader
can completely replace all of what the slocum binaries can do. For straight ascii conversion, yes. For our data processing pipeline, we have moved completely away from the slocum binaries. The slocum binaries ascii conversion does not provide enough floating point precision to decode the embedded echograms which is why dbdreader
is necessary. After these updates (other PRs will follow), we can take a deeper dive into the ascii
and convertDBD.sh
code and see what can be done.
How about an intermediary format other than the ascii
that currently exists (like 1->N parquet
files)? I would like to keep a table-like serialization for other types of processing, distribution, analysis, etc. The netCDF
format was really meant as the format to submit to the IOOS Glider DAC and is always going to be lossy.
Switching to parquet
should be fine. Any shift will be require an initial lift to get started. I think I get it now. You are also not enthralled with netcdf
as the unifying backend. I had hopes seeing activity at xpublish
. So, the goal is to do away with convertDBD.sh
and the ascii
part but make the converter more useful to serve better purpose than just throwing the ascii
part away. Right now the general process is DBD->convertDBD.sh/dbdreader->ascii->netcdf->ERDDAP->data portal. The goal is DBD->parquet/dbdreader->parquet storage->netcdf. The data portal would at some point begin to pull from parquet storage. Is there a xpublish
type framework that would sit on top of parquet
as xpublish
is envisioned for xarray/zarr
?
Sounds like we are on the same page!
*.*bd files
-> dbdreader
-> parquet files
-> profile netCDF files (via pocean) for glider dac
-> backend analysis/viz (static plots, etc.)
-> xpublish
-> frontend analysis/viz (dynamic plots, etc.)
xpublish
can sit on top of parquet
files the same way it can sit upon xarray
datasets... with a little plugin magic. I have a proof of concept that uses duckdb
on top of parquet
file served through xpublish
and it is pretty nice for a quick API, that is the direction I'll be heading.
Splendid. I will let you know when this PR finished and ready to go. Then we can look into pyarrow
a bit more.
This PR is good to go anytime. gutils/tests/test_slocum.py::TestEchoMetricsTwo::test_echogram
has code that will read the echogram and put the profile into a dataframe: numpy, pandas and xarray. Initial starting point for migrating away from convertDBD.sh
. The teledyne.py
module has morphed a few times as it has encountered various sources of code. It can probably use an overhaul when we get closer to tackling the convertDBD.sh
script and friends later. In an earlier life, it also moved away from the slocum binaries. Class functions still lurk in there even after converting to the dbdreader
module. Progress.
Finally ready to move forward with updates to echometrics processing and the addition of parquet as intermediate storage. The pytests all pass when run manually.
2023-08-28 14:06:24,221 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-4.sbd,usf-bass-2016-253-0-4.tbd to usf_bass_2016_253_0_4_sbd.dat
2023-08-28 14:06:24,273 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-5.sbd,usf-bass-2016-253-0-5.tbd to usf_bass_2016_253_0_5_sbd.dat
2023-08-28 14:06:24,425 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-6.sbd,usf-bass-2016-253-0-6.tbd to usf_bass_2016_253_0_6_sbd.dat
PASSED
gutils/tests/test_watch.py::TestWatchClasses::test_gutils_netcdf_to_erddap_watch PASSED
================================================================================== warnings summary ==================================================================================
gutils/tests/test_nc.py: 3030 warnings
gutils/tests/test_slocum.py: 870 warnings
/home/cermak/miniconda3/envs/gutils_py3_9/lib/python3.9/site-packages/compliance_checker/suite.py:185: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
latest_version = str(max(StrictVersion(v) for v in version_nums))
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
/home/cermak/miniconda3/envs/gutils_py3_9/lib/python3.9/site-packages/pyarrow/pandas_compat.py:354: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if _pandas_api.is_sparse(col):
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================== 38 passed, 1 xpassed, 3905 warnings in 440.84s (0:07:20) ==============================================================
Workflow tests are not working.
Looks like the build process needs to know about dbdreader
. Utilization of dbdreader
will completely replace reliance on the x86 slocum binaries for decoding.
There is an odd dependency failure via conda/Docker:
E ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)
This update has a hook to process glider data into parquet
intermediate files. The parquet
enabled processing a shade faster than the ascii method.
This update references issue #12, #24 and #26.
This update to the PR includes a small fix that was discovered when trying to run unit tests on other platforms with different versions of OS and modules. Some small differences in magic
or the way file
operates, the shell must test for data
or ASCII
to allow unit tests to pass.
From our standpoint, this is ready for implementation. Not sure what needs to be done to fix it in the workflow unit tests. Unit tests pass when run manually on all our platforms we have been testing.
Initiate PR with additional updates forthcoming. This is primarily to check results of upstream CI tests before proceeding with additional work.