Open jerrychen04 opened 3 years ago
@jerrychen04
Initial_pks.txt
and Final_pks.txt
? Then, we'll need to add the corresponding headers for every column.Images_pks.txt
and ImageData_Final-pks.txt
and save them to .png files. You can plot them with matplotlib
as a heatmap and 3D-plot (or both).For now, you can focus on the original data only.
Looking at the user manual, an example data point of a Final_pks.txt
is:
120.530 341.209 2304.372 2819.900 5.298 1.000 0.000
It tabulates these peak parameters: accurate m/z values, separation time, peak intensity, peak area, signal-to-noise ratio, and peak membership information
The Initial_pks.txt
file's parameters is tabulated from the variables mz_real, chrm_tt[pk], chrm_ht[pk], cmax, snr1. Mz_real is the real mz value, the chrm_ht list contains the largest intensity value in the interval and tt contains the largest mz value between pos1 and pos2. Cmax (column max) is calculated through this code from the cwt matrix:
clist = [col[pk] for col in cwtmatr]
clist.sort()
cmax = round(clist[len(clist) - 1], 1)
and SNR is the signal-to-noise ratio
@jerrychen04
Instead of Final_pks.txt, it would be nice to use the CSV (comma-separated values) format with headers. The CSV file would look as follows:
m/z,time,intensity,area,snr
120.530,341.209,2304.372,2819.900,5.298,1.000,0.000
121.530,342.210,2305.373,2819.900,5.299,1.000,1.000
...
CSV is more readable and can be opened in Excel and other software. You can use Pandas to create a data frame and save it with to_csv()
. Have you worked with Pandas before?
I would suggest using the CSV format for Initial_pks.txt
also, if it doesn't break the rest of the code.
There was already a plot feature in the code that creates a heatmap of the final signal images, so I just set a parameter to true in the final prediction module and it created a folder in Results
storing the Signal Image of each predicted peak from all Signal Images. It was originally hard-coded to be False since creating the images takes a lot of computing power and time. I have not worked using Pandas before so I may have to look at the documentation and some videos to learn how to convert text to CSV. I think the code should be fine as it doesn't actually use the .txt files (which are just for convenience for the user to view), just the arrays each module returns.
Added saving the files as CSV and committed changes
Instead of converting text to CSV, you can consider using a pandas DataFrame to store the peaks in the script. For tasks that are not computationally intensive, using pandas DataFrame is good idea because using index to refer to rows and column names to refer to columns is much less error prone and make the script easy to read too.
Jerry, can you find out how signal to noise ratio is calculated in the script?
The signal-to-noise ratio is calculated through this:
line_val_ave = np.mean(line_val)
line_scale_opt = line_scl[np.argmax(line_val)]
# snr (using the beginning of the line to locate the time location)
ind = line[1][0]
window_start = max(ind - hf_window, 0)
window_end = min(ind + hf_window, num_points)
noises = stats.scoreatpercentile(abs(row_one[window_start:window_end]), perc)
line_val = []
for i in range(window_start, window_end):
if row_one[i] < noises: line_val.append(row_one[i])
means = np.mean(line_val)
stdevs = np.std(line_val)
data_noises = stats.scoreatpercentile(chrm_ht[window_start: window_end], perc)
data_means = np.mean(chrm_ht[window_start: window_end])
data_stdevs = np.std(chrm_ht[window_start: window_end])
snr1 = line_val_ave / stdevs
snr2 = line_val_opt / stdevs
snr3 = row_one[ind] / stdevs
snr4 = cwtmatr[2, ind] / stdevs
Where snr1 is the snr saved