OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
96 stars 72 forks source link

Added a new parser for parsing BI500 files #1252

Open praneethratna opened 9 months ago

praneethratna commented 9 months ago

Draft PR for BI500 parser code.

CC @leewujung @jmjech

codecov-commenter commented 9 months ago

Codecov Report

Attention: 85 lines in your changes are missing coverage. Please review.

Comparison is base (529fa60) 83.52% compared to head (0cf806b) 46.77%. Report is 11 commits behind head on main.

Files Patch % Lines
echopype/convert/parse_bi500.py 20.00% 84 Missing :warning:
echopype/core.py 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1252 +/- ## =========================================== - Coverage 83.52% 46.77% -36.75% =========================================== Files 64 63 -1 Lines 5686 5772 +86 =========================================== - Hits 4749 2700 -2049 - Misses 937 3072 +2135 ``` | [Flag](https://app.codecov.io/gh/OSOceanAcoustics/echopype/pull/1252/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/OSOceanAcoustics/echopype/pull/1252/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics) | `46.77% <20.56%> (-36.75%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jmjech commented 9 months ago

Hi @praneethratna & @leewujung: Thanks for the initial start on reading BI500 data! Great start.

I copied your parse_bi500.py and core.py to my ~/echopype/convert/ and ~/echopype directories and ran my code:

import echopype as ep
from echopype import open_raw
from pathlib import Path

filename = Path('/home/mjech/NOAA_Gdrive/sonarpros/IDL_Programs/testdata/singlefile/N031-S445-S2000008-F011990-T01-D20000913-T103336-Data')
EKmodel = 'BI500' 
ed = open_raw(str(filename), sonar_model=EKmodel)

You'll notice that I use pathlib's Path for my files. I like it, but to get the filename into echopype, I need to cast it as a string. The reason I mention this is that I ran into some errors based on the file name in parse_bi500.py code and core.py. I think they came up because the BI500 files do not have a "true" suffix, which seems necessary for some of the initial file name and file path checking that is done in parse_bi500 and maybe ParseBase.

Here is what I did to get around these errors:

  1. In core.py, I modified the line in the BI500 sonar model dict entry "validate_ext": validate_ext(''), rather than the None type that was used. You'll notice I use single quotes rather than double quotes so you would have .validate_ext(""). Anyway, that prevented an error that "None" types were not valid.
  2. In parse_bi500.py I needed to a. insert from pathlib import Path b. reorganize the self declarations in the init section to:

    
        self.file_types = FILE_TYPES
        self.timestamp_pattern = FILENAME_DATETIME_BI500
        self.file_type_map = defaultdict(None)
    
        self.parameters = defaultdict(list)
        self.ping_counts = defaultdict(list)
        self.vlog_counts = defaultdict(list)
        self.index_counts = defaultdict(list)
        self.unpacked_data = defaultdict(list)
        self.sonar_type = "BI500"
    
        self.fsmap = self._validate_folder_path(file)
        self.index_file = self._get_index_file(self.fsmap)
You'll notice that I needed to set all the different parameters before going to the "file" lines (the last two lines) because the parameters didn't get set and those "file" functions needed them set.

In addition, in the `_validate_folder_path(self, folder_path)` function, I used Path to get the parent folder:

def _validate_folder_path(self, folder_path): """Validate the folder path.""" folder_path = str(Path(folder_path).parent) fsmap = fsspec.get_mapper(folder_path, **self.storage_options) try: all_files = fsmap.fs.ls(folder_path) except NotADirectoryError: raise ValueError( "Expecting a folder containing at least '-Data' and '-Info' files, " f"but got {folder_path}" )


I seems I needed to do this because the BI500 files don't have a suffix.

I've gotten over the initial humps, but now I get some other errors that may be better for you to look into. Here are the error messages:
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
~/NOAA_Gdrive/sonarpros/Python_Programs/EK_ES/test_BI500.py in <module>
     15 filename = Path('/home/mjech/NOAA_Gdrive/sonarpros/IDL_Programs/testdata/singlefile/N031-S445-S2000008-F011990-T01-D20000913-T103336-Data')
     16 EKmodel = 'BI500'
---> 17 ed = open_raw(str(filename), sonar_model=EKmodel)
     18 
     19 '''

~/.local/lib/python3.10/site-packages/echopype/utils/prov.py in inner(*args, **kwargs)
    235             @functools.wraps(func)
    236             def inner(*args, **kwargs):
--> 237                 dataobj = func(*args, **kwargs)
    238                 if is_echodata:
    239                     ed = dataobj

~/.local/lib/python3.10/site-packages/echopype/convert/api.py in open_raw(raw_file, sonar_model, xml_path, convert_params, storage_options, use_swap, max_chunk_size)
    421     )
    422     # Actually parse the raw datagrams from source file
--> 423     parser.parse_raw()
    424 
    425     # Direct offload to zarr and rectangularization only available for some sonar models

~/.local/lib/python3.10/site-packages/echopype/convert/parse_bi500.py in parse_raw(self)
    240                 self.unpacked_data["pelagic"].append(unpacked_data[:PELAGIC_COUNT])
    241                 self.unpacked_data["bottom"].append(
--> 242                     unpacked_data[PELAGIC_COUNT : PELAGIC_COUNT + BOTTOM_COUNT]
    243                 )
    244                 for trace_num in range(TRACE_COUNT):

error: unpack requires a buffer of 1324 bytes

These errors seem to be associated with the actual reading and parsing of the data file.

If this is unclear, I can send my revised parse_bi500.py and core.py code.

Thanks!
mike
praneethratna commented 9 months ago

Hey @jmjech Thanks for testing out the code locally on sample data and informing me about the errors.

  1. The reason i have used None earlier since as mentioned by you the BI500 files don't have a true suffix and it isn't necessary for such a check. I have changed it to validate_ext("") as suggested and it works fine now.
  2. I have also re-arranged the lines __init__ to solve the errors regarding the initialisation.
  3. The error caused after that is due to offset and count values being used from -Vlog instead of -Ping while unpacking -Data file and there is difference in pings in both the files as discussed and also due a mistake in START_FORMAT value which i have rectified.

You can now pull the latest code changes and everything should work fine on the parser part. Since, we don't have set_groups_bi500.py setup yet we cannot test code using open_raw method but can test the parser as follows:

>>> from echopype.convert.parse_bi500 import ParseBI500
>>> parser = ParseBI500(str('/home/praneeth/echopype/test_ek500'), None)
>>> parser.parse_raw()

where test_ek500 is a folder containing -Data, -Ping, -Vlog, -Info, -Work, -Snap files corresponding to prefix N031-S445-S2000008-F011990-T01-D20000913-T103336. You can access the parsed data of -Data file using parser.unpacked_data and that of rest of the files using parser.parameters.

jmjech commented 8 months ago

Hi @praneethratna. I successfully imported BI500 data and was able to produce a couple of echograms! One thing that needs to be done is convert the values that have been read from the data file to dB. You do that by multiplying by 10*log_base_10(2)/256.