Closed dankelley closed 2 years ago
FYI, in these 2 new test files, all chunks had data_size==10.
This is mainly a note to myself, so I won't double the time it takes to write the comment by making it complete enough for others to follow. (Besides, the code is in flux and this comment applies to something not yet pushed.)
I am working on the task of verifying header checksums. Doing that (with code as I've written it, and I certainly do not want to change my little checksum function, since it is used for all Nortek instruments) will require me to read the header into a block of memory. This is not how the previous code worked, but the refactor was not hard. I have it working on the first of the two test files that stem back to #1676 but it fails on the second, because that second file has 12-byte headers.
I will need to figure out a good plan for this. I suppose one is to create a 12-byte buffer and read 12 bytes, then fseek()
back 2 bytes if the length is only 10.
Then there's the question of what to do with 12-byte header datasets. I have no documentation on them. The docs I have only detail the case 10-byte headers.
Is this all theoretical? Nope. Our second dataset (from #1676) has 12-byte headers. This makes my present (unpushed) code fail on that dataset. My policy is not to push things that fail the test suite, so I will hold off on pushing until I figure this out.
Once that's cleared up, I'll look again at our two new test files from Clark. Note that they will be in tests/testthat/local_data
, once I push.
As you know, I have written to Nortek for some clarification on some of these issues, but I don't know that anything that they say at this point will change what we are planning to do now.
My feeling is that since we already have support for the (non-standard) 12-byte header ad2cp file, we probably shouldn't remove that support. However, these are not (as far as I know) officially released as products nor are they documented in anything other than internal Nortek documentation. Recall that these files were specific to a unit that @krillThor has, and so maybe he has some sense as to whether this is likely to be a data format that is available on more units in the future (and therefore means we should continue to support it in oce
).
If the 12-byte header files were a one-off that will only ever occur in the unit that @krillThor has, then perhaps we should think about how to split that off into a separate function, perhaps outside of oce, so that it is still available but won't complicate the implementation and maintenance of the 10-byte header files.
I've reworked the code to try to handle both 10-byte and 12-byte, but all the code is in flux and will be until Monday or so. It's slow work because it requires rebuilds.
Agreed, the more we can get documentation on files, the better. Guessing is tricky, as is going by private communications ... we need files we can cross-reference.
Hi, I've been successfully using oce to read files from multiple Signature 100 and Signature 250 units. I can't remember if the original data I sent you were from Signature 100 or the prototype Signature 55, this latter is probably a one-off, and is by now defunct. As far as I know usage of Signature 100 and 250 is picking up (they may contain a fifth, vertically oriented broadband echosounder, oce reads the adcp data from these, and skips over the echosounder data), and they have been deployed by multiple institutions in both Arctic and Antarctic environments as far as I know, so my guess is that the format is going to hang around. Let me know if you need more test cases, I just released 1.5 years worth of data for open access (Signature 100), and can probably get access to Signature 250 data as well. Again, thanks for your work, greatly eased my workload when processing these data, which used to consist of manual conversion in NORTEK software, followed by a big dish of spaghetti code
Two things for @richardsc:
tests/testthat/local_data/ad2cp
directory, so I'll know. I ask this partly because it seems from the just-previous comment from @krillThor that the existing code works for multiple adcp files, whereas I'm finding that for your two new files, (a) the first file was a problem [which I have fixed but not pushed to GH] and (b) the second is tricky (see next comment).Note to self -- here is the core of what I'm getting for Clark's "avgd" file. Note that the checksum works for the first chunk (which is the text of the settings) but then fails for all the other chunks. That seems a bit odd. (I know it's all of them because I also did a 3-byte matchup to find chunk starts, and this gave agreement with what read.adp.nortek.ad2cp()
gives.
@krillThor I wonder if you could post a link to where those open data reside (or send me an email at dan.kelley@dal.ca, if that's better). That way, I could try to minimize the chance that any updates I make to the code will break things for you.
Replying to https://github.com/dankelley/oce/issues/1954#issuecomment-1140313922
- Do you have any other software that can read `S102791A002_Barrow_v2_avgd.ad2cp'?
At the moment, I don't think so, as the Nortek software requires purchasing a license. Though maybe there is a conversion utility that is free? @krillThor I'd appreciate some help or guidance here as to how we can "check" these files using something other than consuming the raw "bits and bytes" (in case it's not obvious, that was a little joke based on the fact that "bits and bites" is a type of snack made with cereal and pretzels and nuts ... 😄 )
- Just so I know, what is the instrument type for these new data files?
This is a Signature250, but one of the 5 beam versions. Thus it has 4 slant beams at 250 kHz frequency and a center vertical beam which is 500 kHz. It has had the licenses purchased to enable "waves" and "ice" modes, the latter of which I am using for those test files (hence why I think htere are two files, one for the "average" mode, which includes currents and bottom track, and one for the "ice" mode, which includes the altimeter values as well as some echosounder profiles).
@dankelley:
http://metadata.nmdc.no/metadata-api/landingpage/e85f6d10cdcf3e62cbc3b8b3ab21a359 http://metadata.nmdc.no/metadata-api/landingpage/084743fedd07f97e399f0e183e791966 http://metadata.nmdc.no/metadata-api/landingpage/29dce651c307a3712ba860eedd2f11f1
Top one is the Signature 55 (i.e. one-of-a-kind, 3 beam ADCP with central echosounder beam, two lower ones are 5 beam Signature 100's)
@richardsc: I can try to check the files, but the caveat is that I'm on vacation without access to a Windows machine at the moment. I'll be back in my office in Bergen on June 2., I can then try to download the software (I have at least a couple of versions somewhere) and check the files.
I've never tried to open the "_avgd" files before, as I've mostly been in need of the full resolution data, and these are afaik Norteks automatic "summaries", and in my cases these have always been accompanied by the full datasets.
Btw thanks, Clark, there are some "Canadaisms/North Americanisms" that are lost on me.
Hm. I just noticed something: in the case with all the bad checksums, the computed checksum and the expected always differ by 0x3000. This really makes me wonder whether this "averaged" format has a different algorithm for the checksum. (This is because checksums start with a particular 2-byte number, and add to that with each element in the data.)
I know I'm clutching at straws a bit here, but this certainly does not feel like an accident. If the file were truly broken, I'd expect no systematic relationship between the computed checksum and the expected checksum.
The fact that checksums are ok in the non-"avgd" file in Clark's new sample files (and that they are ok in the other 2 files we have in our test suite and ... I can only assume, in @krillthor's data), makes me think that we are computing that checksum correctly. And it's not a header-length thing because that length is 10 bytes in all the files I've been working with.
But since Nortek seems not to be documenting the checksum calculation anymore (at least in Nortek AS. “Signature Integration 55|250|500|1000kHz (Version Version 2022.2).” Nortek AS, March 31, 2022), this is a bit of a guessing game.
Data checksum error (expected 0x711e but got 0xa11e) at cindex=6274 (44.0003% through file)
Data checksum error (expected 0x3546 but got 0x6546) at cindex=7871 (55.2002% through file)
Data checksum error (expected 0x3120 but got 0x6120) at cindex=9468 (66.4002% through file)
Data checksum error (expected 0x4815 but got 0x7815) at cindex=11065 (77.6001% through file)
Data checksum error (expected 0x252b but got 0x552b) at cindex=12662 (88.8001% through file)
Data checksum error (expected 0x2538 but got 0x5538) at cindex=14259 (100.0000% through file)
@krillThor thanks for letting me know. There is absolutely no reason for you to interupt your vacation to do anything related to this. It can wait until you are back in the office, or perhaps in the meantime I can find a copy of the software.
And @krillThor note about the "_avgd" file makes me think that maybe it is an averaged version of the full res file, rather than a separate data type as I had suspected. This was actually one of the questions I asked Nortek support about last week, so it will be interesting to see their reply.
That might also explain (or partly explain):
a) why the checksums don't seem to be working for that file (but it seems weird they'd use a different checksum algorithm), and
b) why that file seems to only have 6 data chunks.
For b), remember that I had it configured to do currents+bottom track every 20 minutes (with 120s of pinging in each sample), and I left it to sit sampling for about 1.5 hours. The instrument should be keeping all 120s of pings, recorded every 20 minutes, but since that "_avgd" file seems to only have 6 data chunks it seems likely that they have been averaged down into only the 20 minute averages to make for a much smaller file.
An update: branch ad2cp
:
I do not see anything in this work that tells me why the checksums for Clark's 'avgd' file are different from expectations. The reason I was looking at the bytes, which you can see if you put debug=3
in the call, was because I wondered if it was a problem of taking into account too many, or too few, bytes. But I don't think that is the case.
I've done what I can think of doing for now. If we can get updated docs on the file format, that would be great. I don't quite know why they would change the checksum initial value, which is the same across so many instruments and has been for so many years, so I don't expect that to be the answer. And I don't think it's a case of using the pre-averaging checksums, because I can't see why they would be off by a consistent offset.
If we can get Clark's two files read by Nortek software, with a full listing of the output, that would be a great advance. Note that these two files are available as 19xx/1954/S102791A002_Barrow_v2.ad2cp
and
19xx/1954/S102791A002_Barrow_v2_avgd.ad2cp
at https://github.com/dankelley/oce-issues/tree/main/19xx/1954
I believe this is all fixed in "ad2cp" commit 11d1f8eedba5842d0cc438740ee033d1b52bf1f1 so I am closing this. @richardsc -- if you see further problems, please reopen this (if appropriate) or add a new issue.
During a Z call, @richardsc and I discussed problems with two of his files. I am planning to put the files into the github, but have not done that yet.
Status (of the
ad2cp
branch)Done so far:
Action items:
$
operation