BenningtonCS / Telescope-2014

4 stars 0 forks source link

Parse data from output file into a format readable by gnuplot #120

Closed theDarkLard closed 9 years ago

theDarkLard commented 9 years ago

what are computers

theDarkLard commented 9 years ago

Related #119

theDarkLard commented 9 years ago

Now that I've done something concrete with the telescope (#119) I feel less antsy about learning some python. Will be doing this today.

theDarkLard commented 9 years ago

Almost ready to start writing this, been going through the python lesson on code academy and it has been helping a ton.

edaniszewski commented 9 years ago

:snake:

theDarkLard commented 9 years ago

part of code that is written: generates a list of frequencies based on fstart, fstop, freq spacing taken from output file (not yet read from output file, this bit was manual) matches items in freq list with items in pwr list

what still needs to be done: have the parser read the output file, pick out the important bits, stick em in a list

edaniszewski commented 9 years ago

useful bits

https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files https://docs.python.org/2/library/stdtypes.html#string-methods

theDarkLard commented 9 years ago

you are my guardian angel

On Fri, Jul 10, 2015 at 3:14 PM, Erick Daniszewski <notifications@github.com

wrote:

useful bits

https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files https://docs.python.org/2/library/stdtypes.html#string-methods

— Reply to this email directly or view it on GitHub https://github.com/BenningtonCS/Telescope-2014/issues/120#issuecomment-120497609 .

theDarkLard commented 9 years ago

getting there

theDarkLard commented 9 years ago

I have hand written fairly specific guidelines for how this code will be written, now I will start actually putting it all into code.

theDarkLard commented 9 years ago

What I have so far:

r = raw_input("Enter the file you want to parse: ")
#If the input filename does not match a file in the current directory, print "Try a new file"
f = open(r)
#a = raw_input("Enter what you would like to call the parsed files: ")

#This function parses out info that is not the spectrum.
def infoparse(x):
#    q = open("%sINFO.dat" % a, w)
    e = open("testparse.txt", 'w')
    #THis for loop takes only the first two lines in any output, extracts the data types
    #and prints them to the first line in the new file.
    for i,line in enumerate(x):
        if i == 0:
            w = line.split(' ')
            str1 = w[0] + "               " + w[2] + "     " + w[6] + "     " + w[8] + "    " + w[10] + "           " \
                   + w[13] + "         " + w[15] + "          " + w[17] + "         " + w[21] \
                   + "       " + w[24] + "         " + w[26] + "           "
        elif i == 1:
            w = line.split(' ')
            str2 = w[0] + "          " + w[2] + "          " + w[4] + "          " + w[6] + "       " \
                   + w[11] + "       " + w[17] + "     " + w[19] + "          " + w[21] + "     " \
                   + w[23] + "      " + w[29] + "       " + w[34] + "    "
            e.write("#" + str1 + str2 + "\n")
        else:
            break
    x.seek(0)
    #This for loop extracts the data values and prints them under their respective data types
    for i,line in enumerate(x):
        if i % 4 == 0 or i % 4 == 1:
            v = line.split(' ')
            print v
    e.close()

infoparse(f)

f.close()

The formatting is the hard part. I'm still thinking out how to get the data to line up neatly under its respective data type. The wacky spaces in the concatenated string in the first for loop are there because of this formatting reason but there's definitely a better way to do it. But it's coming along.

theDarkLard commented 9 years ago

Also the print statement in the second for loop is just sort of for testing.

edaniszewski commented 9 years ago

very cool!! there's probably a better way of handling the spacing, but that can be figured out later. One thing to note, where you have

e.write("#" + str1 + str2 + "\n")

the str1 variable is out of scope, so the value which you would expect to be in it from the preceding if conditional is not actually there. In fact, I think nothing would be in it, so you would be only writing the values in str2 to file.

One way to get both str1 and str2 written to file would just be to move the e.write(...) line to just below the for loop, e.g.

    for i,line in enumerate(x):
        if i == 0:
            w = line.split(' ')
            str1 = w[0] + "               " + w[2] + "     " + w[6] + "     " + w[8] + "    " + w[10] + "           " \
                   + w[13] + "         " + w[15] + "          " + w[17] + "         " + w[21] \
                   + "       " + w[24] + "         " + w[26] + "           "
        elif i == 1:
            w = line.split(' ')
            str2 = w[0] + "          " + w[2] + "          " + w[4] + "          " + w[6] + "       " \
                   + w[11] + "       " + w[17] + "     " + w[19] + "          " + w[21] + "     " \
                   + w[23] + "      " + w[29] + "       " + w[34] + "    "
        else:
            break
    e.write("#" + str1 + str2 + "\n")

this way (assuming that there are indeed at least two iterations of the loop), both str1 and str2 variables would be initialized and filled.

theDarkLard commented 9 years ago

Ahhhh cool thank you!!

theDarkLard commented 9 years ago

Hey @edaniszewski if you can spare a moment and take a look at this bit of code I would be very grateful:

    #This for loop extracts the data values and prints them under their respective data types
    for i,line in enumerate(x):
        if i % 4 == 0:
            v = line.split(' ')
            dlist = []
            for elem in v:
                if elem[0].isdigit() is True:
                    dlist.append(elem)
                else:
                    pass
            return dlist
            print dlist

It keeps throwing the "string index out of range" error and I can't figure out why...

edaniszewski commented 9 years ago

Sure thing, @theDarkLard !!

Just quickly glancing over, the most likely place where it would throw that error is the line:

if elem[0].isdigit() is True:

where elem is a string extracted from the list v. If you put a print v right after assigning to the variable, e.g.

v = line.split(' ')
print v

I'd bet you'd see something like:

["some string", "some other string", "", "more string action here", "a cool string"]

the offender is the 3rd string in (2nd index). Since it is empty, when you elem[0] over it, you are trying to get the value in the 0th index out of no indexes. Hence the exception. You can filter out empty values from a list pretty easily though. There are a bunch of ways to do it, but an easy way:

filter(None, list)

so, you could say

tmp_list = line.split(' ')
v = filter(None, tmp_list)

or, you could just nest function calls to make it take up less space

v = filter(None, line.split(' '))

so, testing this using python via command line:

>>> x = ["some string", "other string", "", "more string action", "a cool string"]
>>> y = filter(None, x)
>>>
>>> print x
['some string', 'other string', '', 'more string action', 'a cool string']
>>> print y
['some string', 'other string', 'more string action', 'a cool string']

try it out, and see if that helps. If not, I'll dig deeper! (:

theDarkLard commented 9 years ago

yes I think that's the problem!! but the empty index is a little funky:

['DATE', '2015:183:14:53:57', 'obsn', '', '', '0', 'az', '118', 'el', '57', 'freq_MHz', '', '1420.4000', 'Tsys', '132.360', 'Tant', '1990.780', 'vlsr', '', '', '11.89', 'glat', '', '9.007', 'glon', '191.606', 'source', 'Sun', '\n']
['Fstart', '1419.397', 'fstop', '1421.403', 'spacing', '0.009375', 'bw', '', '', '', '2.400', 'fbw', '', '', '', '2.000', 'MHz', 'nfreq', '256', 'nsam', '1048576', 'npoint', '214', 'integ', '', '', '', '', '3', 'sigma', '', '', '', '0.782', 'bsw', '0\n']
theDarkLard commented 9 years ago

so there's an extra ', ' instead of a blank space, or nothing at all

theDarkLard commented 9 years ago

im trying to solve this by saying like

blank  == "\', \'"
if elem == blank:
    pass
elif blah blah blah

but python doesn't like what blank is set to

theDarkLard commented 9 years ago

nevermind i'm an idiot and forgot a colon after my if statement

theDarkLard commented 9 years ago

aaaaaaaand I just accidentally deleted all of my code

theDarkLard commented 9 years ago

found it in the trash. sorry for all the emails guys

theDarkLard commented 9 years ago

ALRIGHT WE'RE BACK IN BUSINESS WITH SOME (almost) WORKING CODE

edaniszewski commented 9 years ago

blank == "\', \'"

be careful here. I'm guessing that you want to set the value of blank, but instead, you are checking for equality.

Single = is to set a variable Double == is to compare

>>> x = 3
>>> y = 4
>>> x == 3
True
>>> y == 4
True
>>> x == y
False
edaniszewski commented 9 years ago

I only skimmed through the results you posted, so I may not have seen something, but I dont think the case exists where you have the value ', ' in the list. Its kinda hard to read everything sequentially, so you can try reading element by element:

for element in v:
    print element

which gives results, for example:

>>> for line in x:
...     print "'{}'".format(line)  # printing in loop doesnt maintain quotes, so Im adding them back in for visibility here 
...
'DATE'
'2015:183:14:53:57'
'obsn'
''
''
'0'
'az'
'118'
'el'
'57'
'freq_MHz'
''
'1420.4000'
'Tsys'
'132.360'
'Tant'
'1990.780'
'vlsr'
''
''
'11.89'
'glat'
''
'9.007'
'glon'
'191.606'
'source'
'Sun'
'\n'

elements that do exist in the list are '', (two single quotes with no space between them). If a string is composed of 0->n characters, then this is the equivalent to a string with 0 characters (hence the Index exception)

theDarkLard commented 9 years ago

Yeah I got it all worked out :) ....almost

theDarkLard commented 9 years ago

Alrighty so first piece of this parse program is pretty much done, some formatting weirdness probably occurs but I need to look at the written files with a scroll-y text editor because its such a wide file.

r = raw_input("Enter the file you want to parse: ")
#If the input filename does not match a file in the current directory, print "Try a new file"
f = open(r)
a = raw_input("Enter what you would like to call the parsed files: ")

#This function parses out info that is not the spectrum for each info block.(AzEl/Tsys/Tant, etc)
def infoparse(x):
    global str4
    q = open("%sINFO.dat" % a, 'w')
    #THis for loop takes only the first two lines in any output, extracts the data types
    #and prints them to the first line in the new file.
    for i,line in enumerate(x):
        if i == 0:
            w = line.split(' ')
            str1 = w[0] + "               " + w[2] + "     " + w[6] + "     " + w[8] + "    " + w[10] + "           " \
                   + w[13] + "         " + w[15] + "          " + w[17] + "         " + w[21] \
                   + "       " + w[24] + "         " + w[26] + "       "
        elif i == 1:
            w = line.split(' ')
            str2 = w[0] + "          " + w[2] + "          " + w[4] + "          " + w[6] + "       " \
                   + w[11] + "       " + w[17] + "     " + w[19] + "          " + w[21] + "     " \
                   + w[23] + "      " + w[29] + "       " + w[34] + "    "
        else:
            break
    q.write("#" + str1 + str2 + "\n")
    x.seek(0)
    #This for loop extracts the data values and prints them under their respective data types
    for i,line in enumerate(x):
        b = []
        if i % 4 == 0:
            v = filter(None, line.split(' '))
            for elem in v:
                if elem == "\', \'":
                   pass
                elif elem[0].isdigit() is True:
                    b.append(elem)
                elif elem[0] == 'S' or elem[0] == 'M' or elem[0] == 'G':
                    b.append(elem)
            print b
            print v
            str3 = b[0] + "    " + b[1] + "       " + b[2] + "    " + b[3] + "    " + b[4] + "        " + b[5] + "      " + \
                   b[6] + "        " + b[7] + "        " + b[8] + "     " + b[9] + "         " + b[10] + "        "
            q.write(str3)
        elif i % 4 == 1:
            v = filter(None, line.split(' '))
            for elem in v:
                if elem == "\', \'":
                    pass
                elif elem[0].isdigit() is True:
                    b.append(elem)
            print b
            print v
            str4 = b[0] + "        " + b[1] + "       " + b[2] + "         " + b[3] + "     "  + b[4] + "  " + b[5] + \
                   "       " + b[6] + "        " + b[7] + "        " + b[8] + "         " + b[9] + "        " + b[10]
            q.write(str4 + "\n")
        else:
            pass
    q.close()

infoparse(f)

f.close()

@edanisweski if you have any edits you feel would make this work any better feel free to chime in. I am still but a novice coder.

edaniszewski commented 9 years ago

@theDarkLard yeah! I have a few ideas, particularly on how to make the extraction/formatting of the elements easier. I'm going to play around with a few things and I'll let you know what I come up with.

Overall, I'd say you're doing pretty well for a 'novice coder'. Theres a bunch of stuff that can be optimized, which isnt really necessary but perhaps interesting to know, and some funky code patterns (like declaring a global variable is usually not something you want/need to do), but its all pretty cool stuff. Good use of enumerate() and % and seek(). It took me a lot longer to learn those existed and to learn what they did. You're a natural!

theDarkLard commented 9 years ago

Hey @edaniszewski , I've written the spectrum parser. There's one thing that I can't seem to get though and it's in the second for i,line in enumerate loop. For some reason the pwr values arent being extracted and funnelled into a list, maybe you can figure out why? Other than that it should be good to go. As always, feel free to make mods/corrections.

r = raw_input("Enter the file you want to parse: ")
input_file = open(r)
a = raw_input("Enter what you would like to call the parsed files: ")

def spec_parse(input_file):
#    q = open("%sSPEC.dat" % a, 'w')
    #grabbing fstart, fstop, spacing
    for i,line in enumerate(input_file):
        if i == 1:
            u = filter(None, line.split(' '))
#            print u
        else:
            pass
    fstart = float(u[1])
    fstop = float(u[3])
    spacing = float(u[5])
    freqs = []
    #producing frequencies to match up with each power value in each spectrum
    #I know that there are 214 pwr readings per spectrum. You can find this by (fstop-fstart)/spacing.
    for y in range(0, 214):
        freq = fstart + (spacing * y)
        freqs.append(freq)
#    print freqs
    #Now to make a list of power values for each spectrum, then make a list of the freqs list and the spectrum lists.
    final_list = [freqs]
    for i,line in enumerate(input_file):
        if i % 4 == 3:
            pwrs = line.split("  ")
            final_list.append(pwrs)
            print pwrs
        else:
            pass
#    print final_list
    #Write our list of lists to the output_file in columns:
    with open("%sSPEC.dat" % a, 'w') as output_file:
        for row in zip(*final_list):
            output_file.write('\t'.join(row) + "\n")

spec_parse(input_file)
input_file.close()
edaniszewski commented 9 years ago

Im looking into this now. Not sure why that is happening but I'll figure it out. When I run this I get an exception:

Traceback (most recent call last):
  File "/Users/erickdaniszewski/Documents/Repositories/Telescope-2014/src/python/test/parse2.py", line 43, in <module>
    spec_parse(input_file)
  File "/Users/erickdaniszewski/Documents/Repositories/Telescope-2014/src/python/test/parse2.py", line 40, in spec_parse
    output_file.write('\t'.join(row) + "\n")
TypeError: sequence item 0: expected string, float found

so to fix this, it would be a simple change

output_file.write('\t'.join(row) + "\n")

becomes

output_file.write('\t'.join(str(row)) + "\n")

the data file I'm using is the one checked in to the repo https://github.com/BenningtonCS/Telescope-2014/blob/master/src/c/srtnver3/2014_084_19.rad not sure if it looks the same as the kinds of files you are generating/using. I'll update more after dinner and some code escapading

edaniszewski commented 9 years ago

dunno if this gets at the data output format you want, but I cleaned up/optimized a bit of what you did:


def spectrum_parse(input_file, output_file):
    # Open the file and extract all the data to a list, split at spaces, and newline stripped
    with open(input_file, 'rb') as f:
        data = [filter(None, line.strip('\n').split(' ')) for line in f.readlines()]

    fstart = float(data[1][1])   # fstart value for the file (assumes single value per file)
    fstop = float(data[1][3])    # fstop value for the file (assumes single value per file)
    spacing = float(data[1][5])  # spacing value for the file (assumes single value per file)

    freq_steps = int(round((fstop - fstart) / spacing))  # get the number of frequency steps
    measure_count = len(data[3])                         # get the number of pwr data points

    # Make sure the number of pwr data points is the same as the expected number of freq steps
    if freq_steps != measure_count:
        raise RuntimeError('Number of frequency steps not equal to number of data points')

    # generate all frequencies
    freqs = [(spacing * i) + fstart for i in range(freq_steps)]

    # get the max string length in the freqs list (this is used for formatting when writing to file)
    max_f = max(len(str(x)) for x in freqs)

    # write to output file
    with open(output_file, 'w') as f:
        for i, line in enumerate(data):
            if i % 4 == 3:
                # get max string length for pwr values in list (used for formatting)
                max_p = max(len(str(x)) for x in line)

                # for each frequency, write out the frequency and its associated pwr value
                # to the file
                for idx in range(len(freqs)-1):
                    format_str = '{:<' + str(max_f + 10) + '}{:<' + str(max_p + 10) + '}\n'
                    f.write(format_str.format(freqs[idx], line[idx]))

# Change input file name and output file name as needed
spectrum_parse('2014_084_19.rad', 'out.txt')

the output file should look something like

   ...                ...
866.997             468.979          
867.006375          464.135          
867.01575           459.261          
867.025125          461.866          
867.0345            456.031          
867.043875          455.417          
867.05325           458.609          
867.062625          461.725          
867.072             459.694          
867.081375          459.518          
867.09075           458.465          
867.100125          461.235          
867.1095            461.325          
867.118875          465.416          
867.12825           458.271          
867.137625          458.673          
867.147             464.301          
867.156375          459.621          
  ...                 ...  

with freq on the left and pwr on the right. if its not the output format you need, feel free to play around with it and change it! or let me know and I can change it. either way!

theDarkLard commented 9 years ago

this looks perfect!!

theDarkLard commented 9 years ago

check it into to the repo! (...b/c i don't really know how to do it :/ )

edaniszewski commented 9 years ago

e22b12ec7428791263b059f69f8488d75b30d6c4

done!

theDarkLard commented 9 years ago

Hey @edaniszewski , could you implement this bit of code into the existing spectrum_parse script? It's for H1 observations (specifically rotation curve) and produces a velocity of something whose light is shifted whatever amount from the 1420.406 center freq, according to the frequency values for the spectra. Code is:

#for HI spectra, generate a velocity corresponding to each frequency in freqs, for doppler shift
    q = 1/1420.406
    c = 299970
    vels = []
    for r in freqs:
        w = 1/r
        vsrc = c*((w-q)/q)
        vels.append(vsrc)

    # write to output file
    with open(output_file, 'w') as f:

        for i, v in enumerate(vels):
            f.write(str(vels[i]))

I got it to work in the existing script, as in make and write the values to the file, but I am struggling to figure out how to fit it into the existing formatting scheme. In other words, am having trouble getting it to print as the first or second column in the parsed file. thanks a ton!

edaniszewski commented 9 years ago

@theDarkLard sure thing. should be easy enough. do you want this to always be included in the output? e.g. you would always have something like:

freq     vel    pwr1    pwr2    pwr3    ...
----     ---    ----    ----    ----    ---
 ...     ...    ...     ...     ...     ...

or do you want it to be optional, so it is only printed out when specified (e.g. only when you are doing H1 observations)? For the ladder, it could be as simple as modifying the call to include a '-h1' flag..

$ python spectrum_parse.py -h1 -o h1_observation.txt

either should be simple enough to do, its just a matter of how you would prefer to use it