MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
47 stars 45 forks source link

DOLfYN: Chunking Data #314

Closed ssolson closed 2 weeks ago

ssolson commented 2 months ago

@jmcvey3 I was talking to a DOLfYN user at Sandia today and they indicated that they were working with a large file and needed to write their own chunking script around DOLfYN bc the file size was larger than their RAM. DOLfYN froze on them which was a bad user experience for them.

Would it be both possible and useful to add a chucking feature to DOLfYN?

jmcvey3 commented 2 months ago

Yes though it might be a separate tool. Let me ping Levi; I believe he ran some tool to do this to split up massive the binary files before even reading them with dolfyn.

lkilcher commented 1 month ago

I have a vague recollection of trying (maybe succeeding?) to create a script that chunks binary files into smaller binary files, but honestly I don't know where it is if it ever was successful. Also, this seems like a difficult piece of code to maintain, so I wouldn't recommend baking it into DOLfYN.

Most of the binary reader tools allow you to specify a range of pings you want to read. Therefore, you should be able to write a script that loops over ranges of pings. In other words: I don't think I typically chunked binary files, instead I chunked them when reading them into "raw" ncdf files.

If that doesn't work, let me know what kind of file it is is, and maybe try killing the process shortly (sometime before memory runs out), and letting me know what top-level function(s) are running when you do this? That will at least tell us what loop is creating the memory leak.

Does that help?

jmcvey3 commented 2 weeks ago

@ssolson Did they ever manage to try the "nens" input argument in dolfyn.read?

ssolson commented 2 weeks ago

@ssolson Did they ever manage to try the "nens" input argument in dolfyn.read?

Thanks for pinging me on this.

The user was not actively having this issue and I do not expect they will try to redo their analysis to close the loop on this issue.

I will close this issue for now and we can reopen if this issue shows up in the future.