Closed hhourston closed 4 years ago
I've not heard of this file type before. If
then it could be considered. There are several high-priority items for 'oce' slated for the solstice holiday, though.
Hi Hana,
As Dan said, a manual that outlines the data format and a sample file would help (email to kelley.dan@gmail.com and clark.richards@gmail.com).
A quick google suggests that it is probably a Sontek ADP, like this one:
https://www.sontek.com/argonaut-xr
(though maybe an older model).
I also found a potential manual (though not on the Sontek site ...) here:
Page 87 has the binary data format.
However, because of a past project we actually did write code to read one type of Sontek ADP. I don't know if it was the same kind (pretty sure it wasn't "multi-cell"), and I don't know if the file format is the same.
You could try a more direct approach by doing:
d <- read.adp.sontek(...)
to see if that gets you anywhere.
The existing code will not handle this instrument. This is because at https://github.com/dankelley/oce/blob/4e59dc0b5eeb6cac3ffc9c26e85cccd9a5c61a59/R/adp.sontek.R#L104 we require byte sequence 0x10 0x02 at the start of the file, but the docs that Clark so kindly pointe out indicate that this file type starts with 0x40 0x02; see page 87 of that documentation, under the heading 'server configuration structure'.
I've not looked at the rest of the docs, and it seems possible that read.adp.sontek()
could be made to work, after changing this and other things. The first few entries following 0x40 0x02 look similar to what we have, so the first thing I'd try (once we have a data file) is permitting 0x40 0x02 at the start, to see how far our code can get. I am heartened by the fact that both the existing code and the documentation that Clark found say the first block is 96 bytes, so perhaps a lot of things are the same. But the table covers 2+ pages in the docs, so there is a lot of stuff to get checked line by line, and possibly modified. And then I think it's possible that we have deeper issues, with things like beam transformation matrix, that might make it hard for existing code to handle this format.
It could be a couple of hours of work, or a couple of days. We'd need a sample file to say more. And, of course, we'd also need sontek docs that match the sample file, because it would be counter-productive to do a lot of coding only to find that the data differ and that we have no pertinent documentation.
I just sent an email containing the file. The file did indeed come from a Sontek Argonaut XR instrument, so the manual Clark found looks right- I have been unsuccessful so far with finding a copy of the manual otherwise.
I will also confirm that read.adp.sontek(...) did not work with the file.
I tried making read.adp.sontek()
permit first byte being 0x40 (in addition to the present 0x10) and it gets somewhere at least. I'll into this some more tomorrow. (I renamed your file for privacy, and have redacted your serial number for the same reason.) A few things to check:
> library(oce)
Loading required package: gsw
Loading required package: testthat
> d<-read.adp.sontek("~/Dropbox/oce-issue-1637/file.arg",debug=3)
read.adp.sontek(...,from= 1 ,to= (missing) ,by= 1 type= adp pcadp ...)
file /Users/kelley/Dropbox/oce-issue-1637/file.arg has 3013196 bytes
have a header, but ignoring it for now
serialNumber= (redacted)
adp.type= 18
frequency=
nbeams= 3
beam.geometry= 1 ; 0 (2 beams); 1 (3 beams), 2 (4 beams with 1 vertical), 3 (4 beams, Janus)
slant.angle= 25
orientation= up
Error in do_ldc_sontek_adp(buf, 0, 0, 0, 1, -1) :
cannot determine #beams or #cells, based on first 1000 bytes in buffer
I was unable to find the beam angle and number of beams in the text file as well, although from pictures of the instrument it looks like it has 3 beams. I'm currently waiting to get the Sontek viewing software ViewArgonaut downloaded onto my computer which will hopefully allow me to confirm that information. I also checked the manual Clark shared and while it listed the number of beams and beam angle for Argonaut SL and ADV instruments, it did not for XR instruments, unfortunately.
Looks to me like the Argonaut-XR is the newer version of the Sontek ADP that read.adp.sontek()
was written to read (10 years ago!). From looking at the product online I think that it has 3 beams only. I haven't found documentation of a beam angle, but 25 seems reasonable to me, since it is designed for relatively shallow water.
Hm, Clark's comment about argonaut-XR being a newer version of an instrument we used before makes me think that oce ought to handle this new format. With the two user manuals open side-by-side, I can see that it should feasible to support argonaut-XR. (I sure wish Sontek would give byte counts in their docs, though, as other manufacturers do, so I didn't have to count bytes. Basically, I think all the manufacturers are copying RDI equipment, and it would have been nice if they had also copied RDI's way of documenting formats...)
I've made a new branch called "argonaut", and I'll fiddle with that over the solstice break ... speaking of which, below is today's analemma from http://emit.phys.ocean.dal.ca/~kelley/analemma/
(Question for @hhourston at the end. I don't want to get too far into the weeds without checking.)
Hm, I'm starting to code the header recognition and getting a problem near the start. The screenshot shows this. According to the docs I have (dated 2001), th "system type" is a single character (i.e. byte) that should have low nibble either 0 or 1, and high nibble either 0, 1, or 2.
As a reminder, a nibble is half a byte, or 4 bits. The snapshot shows the as e.g. 00 for 0 and 01 for 1, so the nibbles I see are 0100 and 1000. So maybe their notation is reversing the bits within the nibbles. Sheesh, I dunno. I would read those as 4 and 16, but those do not fit th allowed pattern (in my first paragraph) so I guess we need to read them in reverse order, i.e. we get
0100 -> 0010 -> 2
1000 -> 0001 -> 1
which, given paragraph 1 here means that the first nibble must be the "high" nibble. So that means the system type is SL
and the frequency is 1.5MHz.
Question for @hhourston do you know if this is type SL, and freq 1.5MHz?
Oh, I'm an idiot. I was looking at the docs that @richardsc told us about at https://github.com/dankelley/oce/issues/1637#issuecomment-567737667 but those are for an argo adv, not an adp.
So most of what I've said above is likely wrong.
I'm back to square 1, and I'll ask @hhourston: if you are looking at these data in some official way, then I imagine someone must have the manual at your site. If you can get in PDF format, please send. Or if it's only in paper format, please scan and send. We won't be able to make any progress without a manual that reveals the data format, byte by byte.
Two bytes forward, one byte back -- that's how I'll assess the progress.
Hm. I find it weird that in searching I can't find an "official" manual for the Argonaut-XR, but I am 99% sure that the manual I linked to is for an ADP and not an ADV.
For one thing, an ADV functions using a completely different geometry that is not "beam" based like an ADP (different transmit and receive transducers).
What makes you think that manual is wrong?
Hm, let me look at the manual again (page refs are the PDF pages, not the print pages)
It was the number 2 that gave me the clue. I was working through the docs, decoding this byte and then that byte, when I got to that part. Then I looked in more detail and actually read the cover page, instead of just noticing the word 'argonaut'.
I tried googling around to try to find docs, but nothing turned up. (A lot of the returned results were for oce, actually.) That is why I'm hoping that @hhourston can find a document; whoever bought the machine likely has one.
Hm, unless the CellBegin
and CellEnd
mean there is one data structure per cell. But then I'm left with the confusing feature that the file contains only 10 or so pairs of 0xB0 0x26 (for a 'LONG' record) or 0xB1 0x16 (for a 'SHORT' one).
Oh, and also the 'SHORT' one does not have a CellBegin
and CellEnd
, so my theory about one-record-per-cell does not make sense.
And the title page ... that's a kicker.
You can see my byte detection in the "argonaut" branch,
commit b8081e42ef08a5a88b0294e4ec399208ff130716; maybe my quick coding is wrong on the byte detection. (Note that you find the second header byte by using as.raw
on the record byte count.)
I thought it might be interesting to analyze the bytes in the file. I'll let my code and the results speak for themselves. I am looking at the mean and std-dev between recurrences of all the possible bytes in a binary file. If data chunks are of equal length, then the starting bytes ought to show a low std-dev of distance.
Code
sink("~/bytes.txt")
for (i in 1:255) {
J<-which(BUF==as.raw(i))
cat(sprintf("%3d 0x%02x %6.1f %6.1f\n",
i,i,mean(diff(J)),sd(diff(J))))
}
sink()
d <- read.table("~/bytes.txt", header=FALSE,
col.names=c("i","byte","mean","sd"))
plot(d$i, d$sd, type="s")
Results
Snippets of the file:
48 0x30 197.0 305.6
49 0x31 116.1 179.1
50 0x32 101.7 157.4
51 0x33 118.4 197.9
52 0x34 147.7 227.6
159 0x9f 748.8 875.9
160 0xa0 794.8 923.0
161 0xa1 152.7 156.0
162 0xa2 869.8 1010.0
163 0xa3 822.2 925.6
174 0xae 989.7 1165.6
175 0xaf 959.9 1067.4
176 0xb0 158.2 161.6
177 0xb1 1078.7 1267.0
178 0xb2 1123.4 1273.8
My thoughts
The bytes with the biggest local anomaly in terms of sd(diff) are 0xa1 and 0xb0. Also, 0x32 is low. Note that argonaut ADV (not ADP ... we know nothing about ADP) velocity records have a trigger byte of 0xb0. This might be an indication that a trigger byte in this file is also 0xb0.
However, to identify data segments, we need to know the byte after the trigger, and then we need to know where the checksum is after that. So the test I'm reporting on here is mainly for interest, and proves little more than that the bytes are not white noise.
Oh, and also
> J <- which(BUF==as.raw(0xb0))
> hist(as.integer(BUF[J+1]),255)
> summary(as.integer(BUF[J+1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 161.0 161.0 156.8 161.0 255.0
> median(diff(J))
[1] 161
This is pretty good evidence that a key byte is 0xb0, and that the byte after is the length of the data record. That still does not tell us what's in all 161 bytes, though. I think that two will be the checksum code, and two will be the actual checksum. So, that means we have 155 or so bytes to be guessed at. Not promising!
PS. doing similar for 0xa1 gives a hair-comb sort of histogram, which is neat to look at but that doesn't mean much to me, so I'm not including it.
Hm, the manual I got from sontek was for an argonaut current meter (like the manual @richardsc found). But the first comment in this issue called it a "multicell doppler current profiler", and if that's right, then I am back at square -1 (not yet at square 0, I reckon!).
I guess I ought to ask Sontek again for a manual. I had asked for information on "argonaut-xr" files. I wonder whether that's the right word?
I really don't think that this is what's described in the manual that Clark sent (or that I got from Sontek, which seems to be the same). I say that because e.g.
> buf<-readBin("~/Dropbox/oce-issue-1637/file.arg","raw",500000)
> is0xb1<-which(buf==0xb1)
> is0xb0<-which(buf==0xb0)
> length(buf) / length(is0xb1)
[1] 1152.074
> length(buf) / length(is0xb0)
[1] 161.4466
shows that we do not have what's expected for data chunks, in this 500K file fragment. (If we had such chunks, we would get numbers like 22 for the 0xb1 case and 38 for the 0xb0 case, since those are the number of bytes in the "short" and "long" data segments, coded by 0xb1 and 0xb0. Also, if I look for cases where these magic bytes are followed by the expected values (i.e. 0xb0 followed by 0x26 or 0xb1 followed by 0x16) then I get under like 0 or 1 hits (I didn't bother keeping track, because this file just clearly is formatted differently than in the docs.
I've added code ("argonaut" branch commit f36061ff9bf6911a8fb3ccbc3ee953faf59f94f8) to read a fair fraction of the header info. You can see what it's decoding at "output", below. (If you run the command, you'll see that there is a lot of info printed after what I am showing here. All of that should be ignored; it's mostly testing code that I have not removed because it held at least a tiny bit of value.)
I don't think I'm out by one byte here and another byte there, because the stuff I'm printing looks correct in the cases where I can check and plausible otherwise. (Reading random bytes gives crazy answer, which I'm not seeing.)
HOWEVER what I am not seeing in the docs is any indication of how the actual data are stored. Obviously, the file has a lot of stuff past the header. But what is it? The docs are simply not telling me.
I see in the .txt file that there are supposed to be 15568 profiles. Taking filesize minus header length and dividing by that,
> (3013196 - 418) / 15568
[1] 193.5238
and the .txt says we have 52 channels, so we have
(3013196 - 418) / 15568 / 52
[1] 3.721611
which I guess could be 4 bytes per channel, if I'm doing something wrong. So, does that make sense? Well, some of the channels (from the .txt file) seem to be dates etc but a lot are 3 velo values and a baskscatter value, and these are described as "R4" and "I", respectively. Assuming that the I is a 2-byte integer, that suggests that, roughly, the average channel size ought to be a little larger than
> (3*4+2)/4
[1] 3.5
and so I guess that makes some sense.
Things to tackle next
My biggest problems seem to be:
Output
> d<-read.adp.sontek("~/Dropbox/oce-issue-1637/file.arg",debug=4)
read.adp.sontek(...,from=1,to=(missing),type=(missing),...) {
file /Users/kelley/Dropbox/oce-issue-1637/file.arg has 3013196 bytes
read.adp.sontek() recognized argonaut_adp type
after checking within file (if 'type' not given), infer type='argonaut_adp'
exported bufEXPORT
about to read 'Argonaut sensor configuration structure' (96 bytes)
ConfigTime: 2017-05-08 08:35:41
serialNumber: E5131 [expect E5131 for issue 1637]
systemType bits: 00 01 00 00 01 00 00 00
lowNibble [1:4]: 0, 0, 0, 1
frequency: 1.5 MHz
highNibble [1:4]: 0, 0, 1, 0
systemType: SL
nbeams: 3
beam.geometry: 1 ; 0 (2 beams); 1 (3 beams), 2 (4 beams with 1 vertical), 3 (4 beams, Janus)
slant.angle: 25
orientation: up
skipping several things in 'Argonaut sensor configuration structure'
about to see if we have 'Argonaut operation configuration structure' (64 bytes)
found 2-byte preface to 'Argonaut operation configuration structure' (64 bytes)
ConfigTime: 2017-05-08 08:35:41
NpingsPerBeam: 1
SampInterval: 4
Lag: 1
ProfilingMode: 1 (0=no, 1=yes) [expect 1 for issue 1637]
Ncells: 10 [expect 11 for issue 1637]
CellSize: 350 (cm) [expect 350 for issue 1637]
about to check for a 'User setup parameters structure' (258 bytes)
found 'User setup configuration structure' (258 bytes)
ConfigTime: NA
BlankDistance: 100 cm [expect ?? for issue 1637]
CellSize: 200 cm [expect 350 for issue 1637]
DeploymentName: MAS01
BeginDepoymentTime: 2017-10-03 18:57:33
CommentLine1: Deployment in Masset Channel
CommentLine2: 54 00' 07.81'' N, 132 09' 15.74'' W
CommentLine3: October 2017
CellBegin: 2700 [expect ??? for issue 1637]
CellEnd: 3200 [expect ??? for issue 1637]
WaterDepth: 3200 [expect ??? for issue 1637]
done with 'User setup configuration structure' (258 bytes); off= 418
The code now reads quite a lot of the headers. There are still some things I don't understand (see above, and see the output from running it on the tests file in question with debug=3
) but I thought the next step would be to look after the headers.
The code (temporarily) exports BUF
and OFF
. I won't get into a lot of details, since the code speaks for itself, and since these comments are basically a diary for me to keep track of things, rather to explain my exploratory procedures fully.
Doing this
n<-1e6
d<-readBin(BUF[OFF+seq(1,2*n)],"integer",size=2,n=n,endian="little")
plot(d,cex=1/5,xlim=c(0,1000))
gives as shown below. Clearly, the bytes are not random, since (a) they are not uniform in the range from 0 to 255 and (b) the probability density function varies in a semi-systematic way over time.
Note that if what we have is headers followed by just data, then things are tricky because velo values are stored in 2 bytes and backscatter values are stored in 1 byte. That mans that some of the patterning might be a consequence of looking at the mass of data as 2-byte values. But, again, that means we certainly have patterns here. The question is -- what patterns.
In the graph, there are patches starting at about 350, 599 and 860. The patches have length about 140 (inferred roughly from calling locator()
and clicking on the graph), with quiet zones of length about 110 between. These numbers are not understood by me. I think there are 10 bins (from the header, although the text file says 11). But in either case, we have 3 beams and velo is 2 bytes per datum, so I could understand a pattern with length 1032=60 or maybe 1132=66 but neither is 140 or 110, so I don't know whether I'm seeing something relating to my understanding, or whether I'm looking for patterns like a character in the book/film "contact".
This is all a guessing game. The docs I have (from @richardsc and from Sontek) are not clear for what happens after the headers. One byte in one of the headers suggests that we have so-called "Long" data chunks, but that would mean we should have a lot of two-byte pairs that we clearly do not have.
The sontek docs I possess are 19 years old. My guess is that the format may have changed. Or, perhaps, this doc is simply irrelevant, applying to adv and not adp. Or ... I dunno. I'm basically out of ideas that seem to hold much promise.
Without full documentation of .arg file structure, this may not be useful, but I also have a .mat file version of the .arg file I sent before. If the .mat file structure may be more easily decoded then I am happy to share a copy and open a new issue if that would be appropriate.
If you have .mat you could try constructing an adp
object with as.adp()
. My guess is that as.adp()
will need a fair bit of new functionality to be useful, though. It may not even work. Generally, the as.
functions in oce are very primitive because the real intention is to read raw data files, which we can expect will not be corrupted. With a .mat file, all bets are off; there's no way to know if the contents are right or wrong.
It seems to me that the present issue should be closed. If, some day, we get a manual, it can be reopened. But I use issue as a to-do list, and there is nothing to be done, now at least.
In oce, we as users to close issues, so I'll let you decide, @hhourston.
If you try as.adp()
and find it lacking, please open an issue. But I would be very surprised if it works, because the adp
object stores data in a compact form (at the byte level) to match manufacture native formats, and my guess is that the matlab file is using 8-byte numeric values instead of e.g. 1-byte "raw" values.
If all you're doing is to draw images, your best plan is
# command to read .mat file. Say the velocity is in a 3 matrix called 'v'
imagep(v[,,1])
to draw an image of the first beam. I have very little expectation that much progress will be made in this sort of thing, though. Someone in the team must have matlab code. You ought to just use that, since we have no way of getting the data into R in a verifiable way.
I was unsuccessful using as.adp() to open the .mat file. Python has some support for .mat files so I will try that route. Thank you for all your efforts on this -- I will close the issue.
Oh, the idea was to use read.mat
to read the matlab file, then do whatever you want with it. But I don't think as.adp()
is going to help after reading it, for reasons explained. I'd just read it, and then "roll my own" analysis to do whatever you want to do, whether that be a graph, a calculation, a hypothesis test, or whatever.
I think there may be several packages for reading matlab in R. Some require a particular (oldish) version of matlab file.
An example of reading matlab files:
d <- R.matlab::readMat("TempLoc80.mat")
Ah, I see. Thanks!
I have a raw ADCP file in .arg format that came from a Multi-Cell Doppler Current Profiler, which I attempted to read into R using read.adp(). An adp object would not created -- I received an error message saying the file type is unknown. Is support for .arg files something that may be implemented in the future?
This concerns archived data so this is not an urgent issue.