du-lab / Trace

2 stars 0 forks source link

LC-MS results and HR-MS results #4

Open jerrychen04 opened 3 years ago

jerrychen04 commented 3 years ago
asmirn1 commented 3 years ago

@jerrychen04

For now, you can focus on the original data only.

jerrychen04 commented 3 years ago

Looking at the user manual, an example data point of a Final_pks.txt is: 120.530 341.209 2304.372 2819.900 5.298 1.000 0.000

It tabulates these peak parameters: accurate m/z values, separation time, peak intensity, peak area, signal-to-noise ratio, and peak membership information

The Initial_pks.txt file's parameters is tabulated from the variables mz_real, chrm_tt[pk], chrm_ht[pk], cmax, snr1. Mz_real is the real mz value, the chrm_ht list contains the largest intensity value in the interval and tt contains the largest mz value between pos1 and pos2. Cmax (column max) is calculated through this code from the cwt matrix: clist = [col[pk] for col in cwtmatr] clist.sort() cmax = round(clist[len(clist) - 1], 1)

and SNR is the signal-to-noise ratio

asmirn1 commented 3 years ago

@jerrychen04

Instead of Final_pks.txt, it would be nice to use the CSV (comma-separated values) format with headers. The CSV file would look as follows:

m/z,time,intensity,area,snr
120.530,341.209,2304.372,2819.900,5.298,1.000,0.000
121.530,342.210,2305.373,2819.900,5.299,1.000,1.000
...

CSV is more readable and can be opened in Excel and other software. You can use Pandas to create a data frame and save it with to_csv(). Have you worked with Pandas before?

I would suggest using the CSV format for Initial_pks.txt also, if it doesn't break the rest of the code.

jerrychen04 commented 3 years ago

There was already a plot feature in the code that creates a heatmap of the final signal images, so I just set a parameter to true in the final prediction module and it created a folder in Results storing the Signal Image of each predicted peak from all Signal Images. It was originally hard-coded to be False since creating the images takes a lot of computing power and time. I have not worked using Pandas before so I may have to look at the documentation and some videos to learn how to convert text to CSV. I think the code should be fine as it doesn't actually use the .txt files (which are just for convenience for the user to view), just the arrays each module returns.

jerrychen04 commented 3 years ago

Added saving the files as CSV and committed changes

du-lab commented 3 years ago

Instead of converting text to CSV, you can consider using a pandas DataFrame to store the peaks in the script. For tasks that are not computationally intensive, using pandas DataFrame is good idea because using index to refer to rows and column names to refer to columns is much less error prone and make the script easy to read too.

du-lab commented 3 years ago

Jerry, can you find out how signal to noise ratio is calculated in the script?

jerrychen04 commented 3 years ago

The signal-to-noise ratio is calculated through this:


        line_val_ave = np.mean(line_val)
        line_scale_opt = line_scl[np.argmax(line_val)]

        # snr (using the beginning of the line to locate the time location)
        ind = line[1][0]
        window_start = max(ind - hf_window, 0)
        window_end = min(ind + hf_window, num_points)

        noises = stats.scoreatpercentile(abs(row_one[window_start:window_end]), perc)
        line_val = []
        for i in range(window_start, window_end):
            if row_one[i] < noises: line_val.append(row_one[i])
        means = np.mean(line_val)
        stdevs = np.std(line_val)

        data_noises = stats.scoreatpercentile(chrm_ht[window_start: window_end], perc)
        data_means = np.mean(chrm_ht[window_start: window_end])
        data_stdevs = np.std(chrm_ht[window_start: window_end])

        snr1 = line_val_ave / stdevs
        snr2 = line_val_opt / stdevs
        snr3 = row_one[ind] / stdevs
        snr4 = cwtmatr[2, ind] / stdevs

Where snr1 is the snr saved