Closed theDarkLard closed 9 years ago
Related #119
Now that I've done something concrete with the telescope (#119) I feel less antsy about learning some python. Will be doing this today.
Almost ready to start writing this, been going through the python lesson on code academy and it has been helping a ton.
:snake:
part of code that is written: generates a list of frequencies based on fstart, fstop, freq spacing taken from output file (not yet read from output file, this bit was manual) matches items in freq list with items in pwr list
what still needs to be done: have the parser read the output file, pick out the important bits, stick em in a list
you are my guardian angel
On Fri, Jul 10, 2015 at 3:14 PM, Erick Daniszewski <notifications@github.com
wrote:
useful bits
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files https://docs.python.org/2/library/stdtypes.html#string-methods
— Reply to this email directly or view it on GitHub https://github.com/BenningtonCS/Telescope-2014/issues/120#issuecomment-120497609 .
getting there
I have hand written fairly specific guidelines for how this code will be written, now I will start actually putting it all into code.
What I have so far:
r = raw_input("Enter the file you want to parse: ")
#If the input filename does not match a file in the current directory, print "Try a new file"
f = open(r)
#a = raw_input("Enter what you would like to call the parsed files: ")
#This function parses out info that is not the spectrum.
def infoparse(x):
# q = open("%sINFO.dat" % a, w)
e = open("testparse.txt", 'w')
#THis for loop takes only the first two lines in any output, extracts the data types
#and prints them to the first line in the new file.
for i,line in enumerate(x):
if i == 0:
w = line.split(' ')
str1 = w[0] + " " + w[2] + " " + w[6] + " " + w[8] + " " + w[10] + " " \
+ w[13] + " " + w[15] + " " + w[17] + " " + w[21] \
+ " " + w[24] + " " + w[26] + " "
elif i == 1:
w = line.split(' ')
str2 = w[0] + " " + w[2] + " " + w[4] + " " + w[6] + " " \
+ w[11] + " " + w[17] + " " + w[19] + " " + w[21] + " " \
+ w[23] + " " + w[29] + " " + w[34] + " "
e.write("#" + str1 + str2 + "\n")
else:
break
x.seek(0)
#This for loop extracts the data values and prints them under their respective data types
for i,line in enumerate(x):
if i % 4 == 0 or i % 4 == 1:
v = line.split(' ')
print v
e.close()
infoparse(f)
f.close()
The formatting is the hard part. I'm still thinking out how to get the data to line up neatly under its respective data type. The wacky spaces in the concatenated string in the first for loop are there because of this formatting reason but there's definitely a better way to do it. But it's coming along.
Also the print statement in the second for loop is just sort of for testing.
very cool!! there's probably a better way of handling the spacing, but that can be figured out later. One thing to note, where you have
e.write("#" + str1 + str2 + "\n")
the str1
variable is out of scope, so the value which you would expect to be in it from the preceding if
conditional is not actually there. In fact, I think nothing would be in it, so you would be only writing the values in str2
to file.
One way to get both str1
and str2
written to file would just be to move the e.write(...)
line to just below the for
loop, e.g.
for i,line in enumerate(x):
if i == 0:
w = line.split(' ')
str1 = w[0] + " " + w[2] + " " + w[6] + " " + w[8] + " " + w[10] + " " \
+ w[13] + " " + w[15] + " " + w[17] + " " + w[21] \
+ " " + w[24] + " " + w[26] + " "
elif i == 1:
w = line.split(' ')
str2 = w[0] + " " + w[2] + " " + w[4] + " " + w[6] + " " \
+ w[11] + " " + w[17] + " " + w[19] + " " + w[21] + " " \
+ w[23] + " " + w[29] + " " + w[34] + " "
else:
break
e.write("#" + str1 + str2 + "\n")
this way (assuming that there are indeed at least two iterations of the loop), both str1
and str2
variables would be initialized and filled.
Ahhhh cool thank you!!
Hey @edaniszewski if you can spare a moment and take a look at this bit of code I would be very grateful:
#This for loop extracts the data values and prints them under their respective data types
for i,line in enumerate(x):
if i % 4 == 0:
v = line.split(' ')
dlist = []
for elem in v:
if elem[0].isdigit() is True:
dlist.append(elem)
else:
pass
return dlist
print dlist
It keeps throwing the "string index out of range" error and I can't figure out why...
Sure thing, @theDarkLard !!
Just quickly glancing over, the most likely place where it would throw that error is the line:
if elem[0].isdigit() is True:
where elem
is a string extracted from the list v
. If you put a print v
right after assigning to the variable, e.g.
v = line.split(' ')
print v
I'd bet you'd see something like:
["some string", "some other string", "", "more string action here", "a cool string"]
the offender is the 3rd string in (2nd index). Since it is empty, when you elem[0]
over it, you are trying to get the value in the 0th index out of no indexes. Hence the exception. You can filter out empty values from a list pretty easily though. There are a bunch of ways to do it, but an easy way:
filter(None, list)
so, you could say
tmp_list = line.split(' ')
v = filter(None, tmp_list)
or, you could just nest function calls to make it take up less space
v = filter(None, line.split(' '))
so, testing this using python via command line:
>>> x = ["some string", "other string", "", "more string action", "a cool string"]
>>> y = filter(None, x)
>>>
>>> print x
['some string', 'other string', '', 'more string action', 'a cool string']
>>> print y
['some string', 'other string', 'more string action', 'a cool string']
try it out, and see if that helps. If not, I'll dig deeper! (:
yes I think that's the problem!! but the empty index is a little funky:
['DATE', '2015:183:14:53:57', 'obsn', '', '', '0', 'az', '118', 'el', '57', 'freq_MHz', '', '1420.4000', 'Tsys', '132.360', 'Tant', '1990.780', 'vlsr', '', '', '11.89', 'glat', '', '9.007', 'glon', '191.606', 'source', 'Sun', '\n']
['Fstart', '1419.397', 'fstop', '1421.403', 'spacing', '0.009375', 'bw', '', '', '', '2.400', 'fbw', '', '', '', '2.000', 'MHz', 'nfreq', '256', 'nsam', '1048576', 'npoint', '214', 'integ', '', '', '', '', '3', 'sigma', '', '', '', '0.782', 'bsw', '0\n']
so there's an extra ', ' instead of a blank space, or nothing at all
im trying to solve this by saying like
blank == "\', \'"
if elem == blank:
pass
elif blah blah blah
but python doesn't like what blank is set to
nevermind i'm an idiot and forgot a colon after my if statement
aaaaaaaand I just accidentally deleted all of my code
found it in the trash. sorry for all the emails guys
ALRIGHT WE'RE BACK IN BUSINESS WITH SOME (almost) WORKING CODE
blank == "\', \'"
be careful here. I'm guessing that you want to set the value of blank, but instead, you are checking for equality.
Single = is to set a variable Double == is to compare
>>> x = 3
>>> y = 4
>>> x == 3
True
>>> y == 4
True
>>> x == y
False
I only skimmed through the results you posted, so I may not have seen something, but I dont think the case exists where you have the value ', '
in the list. Its kinda hard to read everything sequentially, so you can try reading element by element:
for element in v:
print element
which gives results, for example:
>>> for line in x:
... print "'{}'".format(line) # printing in loop doesnt maintain quotes, so Im adding them back in for visibility here
...
'DATE'
'2015:183:14:53:57'
'obsn'
''
''
'0'
'az'
'118'
'el'
'57'
'freq_MHz'
''
'1420.4000'
'Tsys'
'132.360'
'Tant'
'1990.780'
'vlsr'
''
''
'11.89'
'glat'
''
'9.007'
'glon'
'191.606'
'source'
'Sun'
'\n'
elements that do exist in the list are ''
, (two single quotes with no space between them). If a string is composed of 0->n characters, then this is the equivalent to a string with 0 characters (hence the Index exception)
Yeah I got it all worked out :) ....almost
Alrighty so first piece of this parse program is pretty much done, some formatting weirdness probably occurs but I need to look at the written files with a scroll-y text editor because its such a wide file.
r = raw_input("Enter the file you want to parse: ")
#If the input filename does not match a file in the current directory, print "Try a new file"
f = open(r)
a = raw_input("Enter what you would like to call the parsed files: ")
#This function parses out info that is not the spectrum for each info block.(AzEl/Tsys/Tant, etc)
def infoparse(x):
global str4
q = open("%sINFO.dat" % a, 'w')
#THis for loop takes only the first two lines in any output, extracts the data types
#and prints them to the first line in the new file.
for i,line in enumerate(x):
if i == 0:
w = line.split(' ')
str1 = w[0] + " " + w[2] + " " + w[6] + " " + w[8] + " " + w[10] + " " \
+ w[13] + " " + w[15] + " " + w[17] + " " + w[21] \
+ " " + w[24] + " " + w[26] + " "
elif i == 1:
w = line.split(' ')
str2 = w[0] + " " + w[2] + " " + w[4] + " " + w[6] + " " \
+ w[11] + " " + w[17] + " " + w[19] + " " + w[21] + " " \
+ w[23] + " " + w[29] + " " + w[34] + " "
else:
break
q.write("#" + str1 + str2 + "\n")
x.seek(0)
#This for loop extracts the data values and prints them under their respective data types
for i,line in enumerate(x):
b = []
if i % 4 == 0:
v = filter(None, line.split(' '))
for elem in v:
if elem == "\', \'":
pass
elif elem[0].isdigit() is True:
b.append(elem)
elif elem[0] == 'S' or elem[0] == 'M' or elem[0] == 'G':
b.append(elem)
print b
print v
str3 = b[0] + " " + b[1] + " " + b[2] + " " + b[3] + " " + b[4] + " " + b[5] + " " + \
b[6] + " " + b[7] + " " + b[8] + " " + b[9] + " " + b[10] + " "
q.write(str3)
elif i % 4 == 1:
v = filter(None, line.split(' '))
for elem in v:
if elem == "\', \'":
pass
elif elem[0].isdigit() is True:
b.append(elem)
print b
print v
str4 = b[0] + " " + b[1] + " " + b[2] + " " + b[3] + " " + b[4] + " " + b[5] + \
" " + b[6] + " " + b[7] + " " + b[8] + " " + b[9] + " " + b[10]
q.write(str4 + "\n")
else:
pass
q.close()
infoparse(f)
f.close()
@edanisweski if you have any edits you feel would make this work any better feel free to chime in. I am still but a novice coder.
@theDarkLard yeah! I have a few ideas, particularly on how to make the extraction/formatting of the elements easier. I'm going to play around with a few things and I'll let you know what I come up with.
Overall, I'd say you're doing pretty well for a 'novice coder'. Theres a bunch of stuff that can be optimized, which isnt really necessary but perhaps interesting to know, and some funky code patterns (like declaring a global variable is usually not something you want/need to do), but its all pretty cool stuff. Good use of enumerate()
and %
and seek()
. It took me a lot longer to learn those existed and to learn what they did. You're a natural!
Hey @edaniszewski , I've written the spectrum parser. There's one thing that I can't seem to get though and it's in the second for i,line in enumerate loop. For some reason the pwr values arent being extracted and funnelled into a list, maybe you can figure out why? Other than that it should be good to go. As always, feel free to make mods/corrections.
r = raw_input("Enter the file you want to parse: ")
input_file = open(r)
a = raw_input("Enter what you would like to call the parsed files: ")
def spec_parse(input_file):
# q = open("%sSPEC.dat" % a, 'w')
#grabbing fstart, fstop, spacing
for i,line in enumerate(input_file):
if i == 1:
u = filter(None, line.split(' '))
# print u
else:
pass
fstart = float(u[1])
fstop = float(u[3])
spacing = float(u[5])
freqs = []
#producing frequencies to match up with each power value in each spectrum
#I know that there are 214 pwr readings per spectrum. You can find this by (fstop-fstart)/spacing.
for y in range(0, 214):
freq = fstart + (spacing * y)
freqs.append(freq)
# print freqs
#Now to make a list of power values for each spectrum, then make a list of the freqs list and the spectrum lists.
final_list = [freqs]
for i,line in enumerate(input_file):
if i % 4 == 3:
pwrs = line.split(" ")
final_list.append(pwrs)
print pwrs
else:
pass
# print final_list
#Write our list of lists to the output_file in columns:
with open("%sSPEC.dat" % a, 'w') as output_file:
for row in zip(*final_list):
output_file.write('\t'.join(row) + "\n")
spec_parse(input_file)
input_file.close()
Im looking into this now. Not sure why that is happening but I'll figure it out. When I run this I get an exception:
Traceback (most recent call last):
File "/Users/erickdaniszewski/Documents/Repositories/Telescope-2014/src/python/test/parse2.py", line 43, in <module>
spec_parse(input_file)
File "/Users/erickdaniszewski/Documents/Repositories/Telescope-2014/src/python/test/parse2.py", line 40, in spec_parse
output_file.write('\t'.join(row) + "\n")
TypeError: sequence item 0: expected string, float found
so to fix this, it would be a simple change
output_file.write('\t'.join(row) + "\n")
becomes
output_file.write('\t'.join(str(row)) + "\n")
the data file I'm using is the one checked in to the repo https://github.com/BenningtonCS/Telescope-2014/blob/master/src/c/srtnver3/2014_084_19.rad not sure if it looks the same as the kinds of files you are generating/using. I'll update more after dinner and some code escapading
dunno if this gets at the data output format you want, but I cleaned up/optimized a bit of what you did:
def spectrum_parse(input_file, output_file):
# Open the file and extract all the data to a list, split at spaces, and newline stripped
with open(input_file, 'rb') as f:
data = [filter(None, line.strip('\n').split(' ')) for line in f.readlines()]
fstart = float(data[1][1]) # fstart value for the file (assumes single value per file)
fstop = float(data[1][3]) # fstop value for the file (assumes single value per file)
spacing = float(data[1][5]) # spacing value for the file (assumes single value per file)
freq_steps = int(round((fstop - fstart) / spacing)) # get the number of frequency steps
measure_count = len(data[3]) # get the number of pwr data points
# Make sure the number of pwr data points is the same as the expected number of freq steps
if freq_steps != measure_count:
raise RuntimeError('Number of frequency steps not equal to number of data points')
# generate all frequencies
freqs = [(spacing * i) + fstart for i in range(freq_steps)]
# get the max string length in the freqs list (this is used for formatting when writing to file)
max_f = max(len(str(x)) for x in freqs)
# write to output file
with open(output_file, 'w') as f:
for i, line in enumerate(data):
if i % 4 == 3:
# get max string length for pwr values in list (used for formatting)
max_p = max(len(str(x)) for x in line)
# for each frequency, write out the frequency and its associated pwr value
# to the file
for idx in range(len(freqs)-1):
format_str = '{:<' + str(max_f + 10) + '}{:<' + str(max_p + 10) + '}\n'
f.write(format_str.format(freqs[idx], line[idx]))
# Change input file name and output file name as needed
spectrum_parse('2014_084_19.rad', 'out.txt')
the output file should look something like
... ...
866.997 468.979
867.006375 464.135
867.01575 459.261
867.025125 461.866
867.0345 456.031
867.043875 455.417
867.05325 458.609
867.062625 461.725
867.072 459.694
867.081375 459.518
867.09075 458.465
867.100125 461.235
867.1095 461.325
867.118875 465.416
867.12825 458.271
867.137625 458.673
867.147 464.301
867.156375 459.621
... ...
with freq on the left and pwr on the right. if its not the output format you need, feel free to play around with it and change it! or let me know and I can change it. either way!
this looks perfect!!
check it into to the repo! (...b/c i don't really know how to do it :/ )
e22b12ec7428791263b059f69f8488d75b30d6c4
done!
Hey @edaniszewski , could you implement this bit of code into the existing spectrum_parse script? It's for H1 observations (specifically rotation curve) and produces a velocity of something whose light is shifted whatever amount from the 1420.406 center freq, according to the frequency values for the spectra. Code is:
#for HI spectra, generate a velocity corresponding to each frequency in freqs, for doppler shift
q = 1/1420.406
c = 299970
vels = []
for r in freqs:
w = 1/r
vsrc = c*((w-q)/q)
vels.append(vsrc)
# write to output file
with open(output_file, 'w') as f:
for i, v in enumerate(vels):
f.write(str(vels[i]))
I got it to work in the existing script, as in make and write the values to the file, but I am struggling to figure out how to fit it into the existing formatting scheme. In other words, am having trouble getting it to print as the first or second column in the parsed file. thanks a ton!
@theDarkLard sure thing. should be easy enough. do you want this to always be included in the output? e.g. you would always have something like:
freq vel pwr1 pwr2 pwr3 ...
---- --- ---- ---- ---- ---
... ... ... ... ... ...
or do you want it to be optional, so it is only printed out when specified (e.g. only when you are doing H1 observations)? For the ladder, it could be as simple as modifying the call to include a '-h1' flag..
$ python spectrum_parse.py -h1 -o h1_observation.txt
either should be simple enough to do, its just a matter of how you would prefer to use it
what are computers