OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Convert AZFP from glider with separate ancillary data files #198

Closed nlbeaird closed 1 year ago

nlbeaird commented 3 years ago

Hi, Apologies if this is the wrong forum for this question, I haven't used github issues before. This is not really an issue, just a question.

I'm working with a student who is hoping to use echopype to process AZFP data collected by an autonomous underwater glider. We've tried to run the Convert / Process pipeline, but are striking out. I believe the issue is that the AZFP onboard the glider is not reporting ancillary measurements like temperature, pressure, pitch and roll to it's raw datafiles. These datastreams exist as part of the glider engineering data, however are not written to the '.01A' files. Thus when we run .raw2nc() we get the following error, I assume because the unpacked_data doesn't contain any temperature field (or because the temperature coefficients are all 0 in the XML file):

Screen Shot 2020-09-28 at 7 37 33 PM

I'm not very familiar with the AZFP, but your echopype effort seems wonderful! I'm wondering if you have a recommendation, or warning, about how hard it would be to try to use the temp, pressure, pitch, roll data collected separately by the vehicle in the echopype processing?

Thanks much, Nick

leewujung commented 3 years ago

@nlbeaird : This is great! We don't actually have a specific forum for this type of discussion, so thanks for raising an issue!

Could you point us to an example file (+ the XML)? This type of diversity in raw data files are exactly what we haven't had for AZFP. A few people have reported similar problems for EK60 and EK80 data also and we added those into the package (to either no save those parameters at all or pad with NaN, depending on the circumstance).

A related note: @emiliom and I were just talking today about adding a plug-in type of thing for adding GPS data into AZFP files or EK files that do not already have them. There is definitely downstream data processing consequences, since to the minimum when compensating for the sound attenuation the environmental parameters come into play. It'll be cool if you're interested in brainstorming this together!

nlbeaird commented 3 years ago

@leewujung : Thanks for responding so quickly! Glad to hear this is of interest.

I've attached a small example raw file with the XML. I hope this is a helpful example.

Yes! we (the student and I ) would be happy to help think about this in any way.

A few people have reported similar problems for EK60 and EK80 data also and we added those into the package (to either no save those parameters at all or pad with NaN, depending on the circumstance).

This would be great, just as a start for us to get to play with the data after conversion to netcdf. I think from there we would know how to add the environmental parameters for processing. Though it'd be fun to help think about a more general solution for combining GPS/ hydrographic data from other sources to the file.

glider_azfp.zip

nlbeaird commented 3 years ago

Just for completeness I wanted to add what I figured out about this issue (perhaps help me remember later).

Our AZFP mounted on the glider doesn't record temperature (or a number of other things). There appear to be some counts in the position of the temperature data, though they don't vary and are small. The issue I was having is that in echopype/convert/azfp.py the function compute_temp (echopype 0.4.1 ) produces negative values inside a math.log() call, causing the math error, and causing Convert to fail.

To get around this I artificially increased a temperature calibration coefficient in the .xml file so that the math error doesn't stop the whole convert process. Not a great solution, but it let me get the .01A files into xarray format using echopype. We've been working on merging data from the glider to the azfp data in this notebook: https://github.com/nlbeaird/glider_acoustics/blob/master/process_glider_azfp.ipynb . Again, not a general solution, but an example of merging ancillary data with a glider-mounted AZFP.

Thanks for the work on echopype! It's let us newbies make progress on our project!

emiliom commented 3 years ago

@nlbeaird apologies that we never followed up after your last message 3 months ago. Great to hear about your progress, and thanks specially for sharing your current solution merging the glider and azfp data! As @leewujung mentioned earlier, the ability to integrate position and CTD data with the echosounder data is something we plan to add. We'll refer to the work you're sharing via https://github.com/nlbeaird/glider_acoustics (specially glider_azfp.py) when we get to it later this year, and will reach out directly.

We also wanted to let you know that we released a major new version of echopype a month ago, 0.5.0 (and a minor update this week). It should make echosounder data processing a bit simpler and more convenient. Check out the documentation. We'll soon have a couple of sample notebooks you can refer to, too.

Looking forward to closer interactions. I see that we share both UW and Rutgers linkages (I was a postdoc at IMCS, before it became DMCS) :smiley:

lgarzio commented 1 year ago

@leewujung I just wanted to check in about the status of issue since we are still running into this problem with our AZFPs mounted on gliders. Thanks!!

dsmossman commented 1 year ago

Hello! I'm a coworker of @lgarzio and the one now working with glider-based AZFP data. In case it would be helpful to you, I had a look at ASL's method of handling missing temperature and found that the code responsible for loading AZFP data has the ability to detect whether the temperature field is full of NaNs and, if that is the case, replace them with a single temperature value. Is this something that could be added to EchoPype's AZFP processing workflow?

leewujung commented 1 year ago

Pinging @emiliom here since he was working on that platform update component.

@dsmossman : Thanks for the info. Does this mean that this is what your data have? all NaNs in the temperature field? In this case we would want the EchoData corresponding data variable to be NaN also to preserve the raw data. Users can optionally and explicitly replace those values when manipulating data. This is so that the directly converted EchoData objects (via open_raw, before any user modification) would be an authentic representation of what's in the raw files. Specifying changes to data like this (by changing values of the data variable) explicitly outside of what open_raw does automatically is likely very helpful for reproducibility.

dsmossman commented 1 year ago

@leewujung Yes, I just checked with ASL's code and the temperature values in our glider data appear to be NaNs. And that makes sense re: reproducibility. Our glider has a CTD on it, so we are able to use the temperature/salinity/pressure readings from that to calculate the sound speed once the AZFP data are loaded.

leewujung commented 1 year ago

@dsmossman: I see, that makes sense. In echopype the calibration function can take temperature/salinity/pressure variable to calculate sound speed and use it for calibration. If you supply a single value of temperature to the calibration function it would work. I think by default the AZFP temp recorded is temp within the bottle for monitoring, so in their example code they just use an external source of temperature. So this is not a replacement in the strict sense. This was the version from some time ago though and I do not know if they have updates on this specific part.

What I'd recommend though is that if you have the full CTD value, adding those to the Environment group to have those together with the rest of the data is the best, just like how we have the update_platform function to add the GPS locations.

emiliom commented 1 year ago

Going back to @nlbeaird 's March 2021 comment, it looks like the failure you were experiencing happened during conversion, before calibration. @dsmossman can you report whether failure during conversion is still happening when using a recent version of echopype, ideally the latest (0.6.3, from October), using open_raw?

What @leewujung described above (adding CTD value to the Environment group) applies to the calibration step, once you've successfully converted the raw file to netcdf, zarr or an in-memory representation.

Regarding the addition of external platform location data to a converted file, we added that capability a while back (Sept. 2021) and have incorporated some improvements to it since then.

dsmossman commented 1 year ago

@emiliom Yes, the failure occurs when I try and use open_raw with the latest version of Echopype on files that do not have an XML file with an artificially increased temperature coefficient. I am able to add environmental data from the glider and recalculate sound speed without any issues, but only after I "trick" open_raw into letting me open the files.

dsmossman commented 1 year ago

Another update: having looked through some of the testing @nlbeaird did, it seems as though the failure point is the XML file, not the 01A echosounder files. When the AZFP does not record its own temperature/pressure, these values in the XML file are all 0s, leading to a divide-by-zero error in convert_raw. Hence the need for a fake XML file with artificial constants to allow the conversion to continue. Would it be possible to alter convert_raw to accept optional temp/pressure parameters from an outside source?

emiliom commented 1 year ago

In PR #1020 we've addressed the last remaining task in this issue. That PR has been merged into our development branch, dev, and will be part of the next echopype release this month.

I'll close the issue now.