USGS-CMG / stglib

Routines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data
Other
17 stars 14 forks source link

adding LISST to stglib #128

Closed rallen-usgs closed 12 months ago

rallen-usgs commented 1 year ago

Sequoia Scientific's LISST instrument is used by a handful of folks at USGS; it would be great to bring it in to stglib. As we start to add it, here's where we can discuss problems, questions, etc.

dnowacki-usgs commented 1 year ago

Thanks Rachel! Could you provide a sample ASCII file exported from the LISST software? Is it just a single file or multiple files? Which variables would you want to report?

rallen-usgs commented 1 year ago

Sure, here's a relatively short csv spit out by the LISST software (LISST-SOP200X.exe); it's from a time we were profiling during ERO20 (Jan 22, 2020), so we left the instrument at a few different depths for 1 min each. The LISST software spits out two csvs - one labeled..._rs.csv and one labeled ....csv. I prefer the rs one because it uses their 'randomly shaped' model for particle size. The other one assumes particles are spheres. Documentation on what all the fields are can be found in the LISST-200X Users Manual, appendix C (also attached). We can talk about what variables to include, but my gut instinct is to include them all. I hadn't been including beam attenuation (col 61) in the matlab processing I was doing, and then a bunch down the line I realized that one was probably useful! L0221705a_rs.csv LISST200X_UsersManual.pdf

dnowacki-usgs commented 1 year ago

Great, thanks. Is this exported with the latest, or at least a recent, version of the LISST software? It looks like it was redesigned recently. We should probably support the format exported by the current software. https://www.sequoiasci.com/lisst-200x-software/

rallen-usgs commented 1 year ago

Sorry for the delay! New version from a more recent version of the LISST software (LISST-200X v1.21) attached. L0221705_v2_rs.csv

dnowacki-usgs commented 1 year ago

Comments from Rachel, brought over here:

1) I wonder if the "ring" coordinate should be particle size? Is there any reason not to give it the more descriptive name?

I kept it as ring since there are three sizes associated with each ring (median, upper, lower)

2) AnIn1 - does it need "(fluorometer 1 in LISST-HAB and LISST-BLACK)"? Would this code work for those intruments?

The variable names are pulled straight from the LISST manual Appendix C: Data File Formats (pp. 81). We could certainly test if we had some files for other instruments.

3) Depth - I wonder if it's worth asking for the orientation of the instrument (vertical vs horizontal) in the intake, so an offset can be applied here if needed. The pressure sensor isn't at the optics, so if the instrument is vertically oriented an offset is needed to get the depth of the optics.

This is a good question and one we should discuss further.

4) do they report it as Mean DIameter? Or median? I need to check. I would have thought they reported median...

This is straight from Appendix C: Data File Formats (pp. 81). I agree it is strange, given the three ring sizes are median, lower, and upper.

5) a bunch of the fields are int32 not floats. Is that the right way to handle those fields?

By default, if variables only contain integer values, xarray will save them as ints.

6) Computed Optical transmision - need to check the units

7) I would give a more descriptive name for vc - maybe just "VolumeConcentration"

Check Appendix C and let me know if you think the units are correct. I believe it is unitless ("1").

8) Can it handle burst data? We often use it in burst mode, so being able to handle bursts would be GREAT

Awesome question. If you have a sample burst file, send it along!

rallen-usgs commented 1 year ago

Thanks for the responses Dan! A couple more thoughts: [1]. the upper and lower values for the ring sizes match (that is, the upper end of ring 1 is the same as the lower end of ring 2). So, perhaps we could give the sizes as "median" (with 36 values) and "bounds" (with 37 values)? I use the bounds for computing a cdf, so it's def useful to have them! But I always convert from the weird two column 36 value setup to a single vector of 37 values.

[4]. I often don't use the diameter reported by Sequoia, so maybe this doesn't matter. I end up computing my own D50 and it's often different from what they report. Maybe it's because they report a mean diameter...

[5]. looks like Acc X, Y, and Z aren't used. Should they be removed?

[6]. ah, does "1" mean unitless? I didn't know.

[8]. I'll send a burst file! It might be large...

rallen-usgs commented 1 year ago

Hi Dan, Olivia and I were looking at the new .nc file produced by stglib, and we have some thoughts: 1) QA/QC params to add: date (start/ end); % transmission (LISST data is commonly ignore when % transmission is below 15% or 30%); depth (making sure the inst is submerged); and potentially "bad ensembles" for some user-defined bad periods 2) should the names in the resulting .nc file match the standard names put out by other instruments? Eg. pressure as P_1?

The file you last showed me was a .nc file, but I think maybe it should have been a .cdf file, since no QA/QC was done on it, and the field names were from the instrument, not a USGS standard.

I'm also attaching a burst LISST file that I collected a couple years ago, and the corresponding metadata for it, so metadata can make it in. Hopefully this is useful! (the burst file is shortened so just the first part is included, so that it's not too large for github...). I just made up the config.yaml file, based on other ones I've seen. note that github didn't like the yaml file, so it's stored as a yaml.txt file here...

csf20cht05_short_rs.csv gatts_CSF20CHT.txt csf20cht05_config.yaml.txt

rallen-usgs commented 1 year ago

Hi Dan, I was just looking a little closer at bringing in burst data to the lisst, and I found a couple things.

First, the 200X has an "LOP" file, that stores the configuration of the instrument. The documentation on what each of the letter pairs means is in the LISST manual, but "OM" is operating mode, and it's 0 for not-bursting and 1 for bursting. Is there a good way to include the LOP? Should the yaml make sure to ask for the fields in the LOP? Should the LOP be added as an input file, along with the csv?

Second, the csv doesn't seem to store a "burst number", you just have to check the timestamp. (the LOP does have information on the burst interval and the samples per burst, so you know what it should be!) I'm not sure what the best way to handle this in stglib is.

LOP for the L021755_v2_rs.csv attached in the next post.

rallen-usgs commented 1 year ago

LOP (compressed as a zip folder, and again re-named as a .txt) for the csv above attached here.

ERO_LISST_PROFILING_20200122.zip

ERO_LISST_PROFILING_20200122.lop.txt

dnowacki-usgs commented 1 year ago

Thanks Rachel. Does the .LOP file always exist, no matter what mode the instrument collected in?

dnowacki-usgs commented 12 months ago

Closed by #154