OSOceanAcoustics / echoregions

Interfacing water column sonar data with annotations and labels
https://echoregions.readthedocs.io/
Apache License 2.0
6 stars 5 forks source link

merge muliple .evl files into one dataframe #204

Open jmjech opened 2 months ago

jmjech commented 2 months ago

Hi, I am using Echopype to process Simrad EK data and I have line files generated in Echoview (.evl). I can read multiple .nc files and combine them, but it appears that I am not able to read multiple .evl and combine them into one Panda dataframe using the .read_evl function.

For example, in Echoview I can generate a bottom line for each EK data file individually. Now, I would like to read the EK files and read the .evl files using Echopype. I can combine the EK files into one "xarray". However, it seems that I can only read one .evl file at a time, and now that Pandas has removed the .append function, I can not easily generate a single data frame from the multiple data frames that are created by read_evl.

Example code:

import echopype as ep
import echoregions as er
import pandas as pd

# read in two EK60 files that have been converted to netCDF4
ek_list = ['ek_file_1.nc', 'ek_file_2.nc']
edlist = []
for fn in ek_list:
    edlist.append(ep.open_converted(fn))
ed = ep.combine_echodata(edlist)
Sv = convert_to_Sv(ed, wf='CW', encode='power')

# read in two .evl files
evl_list = ['evl_file_1.evl', 'evl_file_2.evl']
evldf = pd.DataFrame()
# I have tried this without success
for fn in evl_list:
    tmp = er.read_evl(fn)
    evldf.append(tmp.data)   # this does not work because Pandas has removed the append capability
# or I could do
evldf_1 = er.read_evl(evl_list[0])
evldf_2 = er.read_evl(evl_list[1])
# but this is not dynamic

The recommended way in Pandas to merge data is to do the merging using lists, then create a data frame. However, read_evl defaults to a data frame and it is not efficient to convert each data frame back to a list, merge lists, the create a new data frame. After looking at the code (parse_evl), it seems the best way is for echoregion to do the merging internally and return a single data frame - in my opinion.

Unless, there is a way that I haven't found?

Thanks, mike

leewujung commented 2 months ago

Hey @jmjech : If I understand correctly you'd like echoregions to have a functionality to combine multiple Line objects, is that right? That should be straightforward to add, just need a bit of time.

For the moment, pd.concat should do what you want? Something like below (note I haven't actually tested it):

# read in two .evl files
evl_list = ['evl_file_1.evl', 'evl_file_2.evl']
evl_data_list = []
for fn in evl_list:
    tmp = er.read_evl(fn)
    evl_data_list.append(tmp.data)

new_evl_data = pd.concat(evl_data_list)
new_evl_data.to_csv("all_evl.csv")

lines_all = er.read_lines_csv("all_evl.csv")  # maybe this is what you're looking for?

For pandas concatenation, i think it is usually faster if one accumulated as a list and then do a one time joining, vs joining them in each iteration of a loop.

@ctuguinay : feel free to chime in!

ctuguinay commented 2 months ago

That should work, and I'm pretty sure that's what I did earlier on when organizing our own EVL data. I think this would be nice to have in the package and I can add this as a TODO that can probably done in the same PR as #124.

jmjech commented 2 months ago

@leewujung - yes, concat works! I was reading about append and concat in a StackOverflow conversation, and I mistakenly conflated both, when only append was removed. Very nice solution. @ctuguinay - many thanks for adding this to echoregions. Very nice code.