USGS-CMG / stglib

Routines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data
Other
17 stars 14 forks source link

Problems with handling irregular burst in RBR D|wave processing #79

Open ssuttles-usgs opened 1 year ago

ssuttles-usgs commented 1 year ago

Processing RBR D|wave burst pressure data using runrskcsv2cdf.py script fails if there are incomplete (or irregular) bursts, except when it is the last burst in the burst.txt file. Existing code in rsk.csv2cdf.py relies on all bursts, except the last one, to be exactly the length specified in samples_per_burst attribute, which it uses to shape the burst data into (time, sample) dimensions. I have encountered burst data that had an irregular burst in the middle of the deployment, but had good bursts otherwise, and the only way to get the good data to process was to manually remove the bad burst(s) from the burst.txt file.

Suggest using burst counter and time stamp in the burst.txt file to check consistency of each burst, and if a bad burst is encountered, fill(trim) missing(extra) values, and proceed. Also look for any unexpected events in the events.txt file, and if encountered warn user to further investigate potential issues with the deployment.

dnowacki-usgs commented 1 year ago

Yes, malformed files like this can be a bear to deal with. I think having a checker looking for the things you suggest is the right way to do it, and then warn the user about the malformed file.

ocheriton commented 1 year ago

Just chiming in to say that I am also encountering this issue (an incomplete burst in the middle of otherwise fine RBR pressure record) and it is turning out to be a real headache to deal with. I am going to try chopping the burst file into two files, but even this is tricky ... the file is so big my text editor program keeps getting hung up. Before I go too far down this road, does the runrskcsv2cdf.py script only use the burst csv file?

ssuttles-usgs commented 1 year ago

I would suggest reading the file into a pandas dataframe, find the bad/incomplete bursts and remove them from the DF, then write back out the modified csv file. There is a way to have missing parts of burst data filled with NaNs using multi-indexes and unstack but it is not in stglib, at least not yet.

Steven Suttles U. S. Geological Survey, Woods Hole Coastal and Marine Science Center 384 Woods Hole Road, Woods Hole, MA 02543-1598 @.**@.> (508) 457-2228 (o) (508) 524-5871 (c)


From: ocheriton @.> Sent: Tuesday, August 22, 2023 1:13 PM To: USGS-CMG/stglib @.> Cc: Suttles, Steven E @.>; Author @.> Subject: [EXTERNAL] Re: [USGS-CMG/stglib] Problems with handling irregular burst in RBR D|wave processing (Issue #79)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Just chiming in to say that I am also encountering this issue (an incomplete burst in the middle of otherwise fine RBR pressure record) and it is turning out to be a real headache to deal with. I am going to try chopping the burst file into two files, but even this is tricky ... the file is so big my text editor program keeps getting hung up. Before I go too far down this road, does the runrskcsv2cdf.py script only use the burst csv file?

— Reply to this email directly, view it on GitHubhttps://github.com/USGS-CMG/stglib/issues/79#issuecomment-1688605645, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJEPJ2GINITOUZYHQB7IOWDXWTSD3ANCNFSM6AAAAAAROK2B4Y. You are receiving this because you authored the thread.Message ID: @.***>