Unable to parse data from Mega system

mbenson182 commented 5 years ago

I realize that this is my second issue request in a week, for which I apologize, but the problem I'm having now is of much higher importance to me than the last one, as I'm primarily focused on just being able to replicate the parsing and rectification methods on my own data sets. I've been able to get the read() and correct() functions working on the test data, which is about as much as I need (at least for now).

However, I've been trying to use these functions on some data my group has collected, and have been unable to get it parsed out. It seems like in the calls to pyread or pyread_single when trying to parse the scans (in getmetadat() and _get_scans(), respectively).

This is the output when I try to run PyHum.read():

Input file is Rec00003.DAT Son files are in Rec00003/ cs2cs arguments are epsg:26949 Draft: 0.3 Celerity of sound: 1450.0 m/s Transducer length is 0.108 m Only 1 chunk will be produced Data is from the 2 series Checking the epsg code you have chosen for compatibility with Basemap ... ... epsg code compatible WARNING: Because files have to be read in byte by byte, this could take a very long time ... [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 34.1s [Parallel(n_jobs=2)]: Done 2 out of 4 | elapsed: 34.1s remaining: 34.1s [Parallel(n_jobs=2)]: Done 4 out of 4 | elapsed: 34.1s remaining: 0.0s something went wrong with the parallelised version of pyread ... Traceback (most recent call last): File "PyHumRead.py", line 78, in reader() File "PyHumRead.py", line 18, in reader ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m') File "/home/user/miniconda2/envs/pyhum/lib/python2.7/site-packages/PyHum/_pyhum_read.py", line 427, in read metadat = data.getmetadata() File "PyHum/pyread.pyx", line 532, in PyHum.pyread.pyread.getmetadata File "PyHum/pyread.pyx", line 538, in PyHum.pyread.pyread.getmetadata TypeError: 'NoneType' object is not subscriptable

I don't particularly mind that the Parallel process works, but the interesting thing is that it seems to execute correctly, and then crash as it tries to finish up the process, which is interesting. Anyway, I wrote up a script to try to run the Parallel parsing method, but without actually calling Parallel (so a single-threaded way of calling the same code, since apparently the code executed by the except: block is different than that in the try: block). The code is:

import PyHum as ph
import glob, sys
import os
import pdb

import PyHum.utils as humutils
import PyHum.pyread_single as pyread_single

def reader():
    humfile = "Rec00003.DAT'"
    sonpath = '"Rec00003/"
    c = 1450.0;
    draft = 0.3;
    t = 0.108;
    model = 2; #2

    # ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m')

    # get the SON files from this directory
    sonfiles = glob.glob(sonpath+'*.SON')
    if not sonfiles:
        sonfiles = glob.glob(os.getcwd()+os.sep+sonpath+'*.SON')

    base = humfile.split('.DAT') # get base of file name for output
    base = base[0].split(os.sep)[-1]

    # remove underscores, negatives and spaces from basename
    base = humutils.strip_base(base)

    print("WARNING: Because files have to be read in byte by byte,")
    print("this could take a very long time ...")

    # Single-threaded version of Parallel call
    X = []; Y = []; A = []; B = [];
    for k in range(len(sonfiles)):
        X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949")

def getscans(sonfile, humfile, c, model, cs2cs_args):

   data = pyread_single.pyread(sonfile, humfile, c, model, cs2cs_args)

   a, b = data.getscan()

   if b == 'sidescan_port':
      dat = data.gethumdat()
      metadat = data.getmetadata()
   else:
      dat = None
      metadat = None

#    return a, b, dat, metadat

if __name__ == '__main__':
    reader()

The output when I run this code block is:

WARNING: Because files have to be read in byte by byte, this could take a very long time ... Traceback (most recent call last): File "PyHumRead.py", line 77, in reader() File "PyHumRead.py", line 37, in reader X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949") File "PyHumRead.py", line 64, in getscans a, b = data.getscan() File "PyHum/pyread_single.pyx", line 473, in PyHum.pyread_single.pyread.getscan File "PyHum/pyread_single.pyx", line 502, in PyHum.pyread_single.pyread.getscan File "PyHum/pyread_single.pyx", line 458, in PyHum.pyread_single.pyread._get_scans MemoryError

There's definitely a problem going on in the parsing somewhere, but I'm not sure how to tackle figuring it out, as the problems seem to be happening in the Cython files in private data types which I can't figure out how to access.

Any help would be greatly appreciated! I'd attach the data I'm working off of, but the zipped file is about 130 MB; let me know if there's a good way to get it to you.

mbenson182 commented 5 years ago

Daniel,

Would you be able to provide the documentation you used (if any) in order to determine how to decode the byte packing in the Humminbird files? After some further debugging fueled by coffee this morning, I've found that the _decode_humdat function in pyread_single is definitely not parsing the file I have correctly. Here's my output if I print "headdict", the variable returned by that decode_humdat function:

{'linesize': 872742912, 'recordlens_ms': -429582336, 'water_type': 'deep salt', 'lat': 89.99999999999996, 'sonar_name': '151257088', 'lon': -3255.203218234509, 'unix_time': 2028059996, 'numrecords': 680853760, 'water_code': 1, 'filename': 'Rec00003.S', 'utm_x': -362381825, 'utm_y': -639219968}

Some of the filename is cut off, and the utm coordinates (and hence lat/long as well) don't really make sense, the utx time is wrong (the one listed is in 2034, and unfortunately I haven't quite mastered time travel yet), and I presume some of the other fields are misparsed as well. Perhaps the model I'm using has a different byte-packing method, or Humminbird changed the firmware for my model (and hence how the data stacks as well). Either way, getting some documentation, if it exists, should help with this greatly.

I've also attached the .DAT file if you want to take a look or try it on your own machine.

Thanks, Mike DatFile.zip

dbuscombe-usgs commented 5 years ago

The data formats are in the docs folder.

Yes, it seems likely the firmware and model number don't match up. I know of no way to program against various models and firmwares

Here is a non-cythonized version of the main reading script. I also translated into python 3. I hope to move pyhum fully over to python 3 over the coming weeks and months

pyread.zip

BenthicSubstrateMapping / PyHum

Unable to parse data from Mega system #69