Open leewujung opened 10 months ago
@leewujung has there been any progress on this issue? I have a ~ 4GB ad2cp file from a signature 100 instrument that I cannot convert. After a few minutes I get some warnings "UserWarning: Converting non-nanosecond precision datetime..." followed by a notice that the process was killed and another warning "UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown". My laptop has 16GB ram and should be able to handle the conversion no problem. I was monitoring the process memory usage and it was not excessive.
Is there a workaround in the meantime?
Hey @jessecusack : We haven't been able to work on this further because we're over-committed with other priorities.
Would you be interested in working on it? If I remember correctly from prior investigations, the main thing that creates the large memory expansion was the xr.merge
, which we can side step by changing the handling of how data from different modes are stored, and probably also writing parsed data to disk directly. File size and memory are not always a one to one match, depending on the computation details involved.
For "UserWarning: Converting non-nanosecond precision datetime..." -- this is something we know how to fix, as we've fixed that for other echosounder models.
For "UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown", could you copy-paste the entire error message, or better yet, upload a notebook gist so that there's a reproducible example?
This is originally from #407, but focus of that issue was turned to EK echosounder data instead. Issues related to EK files were addressed in #1185.
This new issue is to capture the similar needs for AD2CP files. The same approach as in #1185 would likely work here too, with the caveat that some part of it may need to be at the parser stage if the file is very very big. Say an AD2CP has a volume of a few GB, and system memory is small, then
parser.parse_raw
may fail due to insufficient memory.From #407:
Looking back:
xr.concat
approach done for EK data to avoid potentially large overhead forxr.merge
.A caveat here is that, without parsing all AD2CP data packets (aka datagrams in EK raw files), the "final" shape of the entire zarr store may change across the batches of sequentially parsed data packets. Some work is needed here to figure out a strategy to handle this.