gaofengnan / WAVE

MATLAB Implementation of WAVE (Wasserstein Distance Analysis in Steady-state Variations in smFRET)
MIT License
1 stars 0 forks source link

How to apply WAVE to real data containing donor and acceptor intensity #2

Open tuffwave opened 6 months ago

tuffwave commented 6 months ago

I successfully concluded the WAVE analysis using simulation data based on your comments. I would appreciate it if you could provide specific guidance on how to analyze real experimental data, such as TAR-DNA Hairpin Folding, using WAVE, as mentioned in your paper (Chen, T., Gao, F., & Tan, Y-W. (2023)). Specifically, I have data in a txt file containing donor and acceptor intensity information over time. It seems similar to simulated data generated by MakeNonequilibriumHMMPoissinData.m, but I'm curious about how to apply real data.

admixture-CT commented 6 months ago

It is necessary to perform FRET data preprocessing including FRET trajectory region detection and FRET efficiency calculation on raw donor&acceptor intensity trajectories before WAVE analysis. The input of WAVE consists of three types of data, donor&acceptor intensity-time trajectories, region boundaries and FRET efficiency-time trajectories.

For example, in our test folder, test1.txt contains the donor&acceptor intensity-time trace (first&second column) of the first biomolecule sample, and its corresponding region boundary information and FRET efficiency-time trajectory are recorded by test1.txt Region.txt and test1.txt Efficiency.txt in subfolder test\E. The recording format of each file can be referred to "User guide"-->"Run TestNonequilibrium3 individually" section in README.md.

The FRET data preprocessing is not mentioned in our paper, since each research group adopts different methods of FRET trace region detection and FRET efficiency calculation when processing raw FRET data, and WAVE analysis is not sensitive to the preprocessing method. In our lab, the FRET trajectory region detection is achieved by a matlab script, which contains a large number of binary tree selection structures, representing our artificial criteria for the identification of FRET trajectory regions. This script is only used in our lab because it relies on artificial criteria, although it works very well when dealing with FRET data at various signal-to-noise ratios. If your dataset contains only a few hundred raw donor&acceptor intensity trajectories, we recommend you to manually identify their regions, and store these results in corresponding "... Region.txt" files as we show in the test folder.

The FRET efficiency calculation approach used in our lab is based on Haw Yang's work (Watkins L P, Yang H. Biophysical Journal, 2004, 86(6): 4015-4029. ; Watkins L P, Chang H, Yang H. The Journal of Physical Chemistry A, 2006, 110(15): 5191-5203.). We expand their work to be appliable to intensity-time trajectories. As shown in line 128-135 of MakeNonequilibriumHMMPoissonData.m. We use two function, precalculationTimeBin.m and postcalculationTimeBin.m, to calculate the FRET efficiency of one biomolecule's donor&acceptor intensity trajectories.

The input and output of precalculationTimeBin.m is as follows: [betaA,betaD,IbetaA,IbetaD]=precalculationTimeBin(regionF,regionC,regionB,s) where regionF, regionC and regionB are the frame boundaries of FRET region, Crosstalk region and Background region, and are all one-dimensional vectors. The first elements of these three parameters represent the beginning of their correponding regions while the last elements represent the end of their correponding regions. The last input s is a two-column matrix, recording the donor channel intensity-time trajectory in its first column and the accepter channel intensity-time trajectory in its second column.

The input and output of postcalculationTimeBin.m is as follows: [J,E]=postcalculationTimeBin(s,betaA,betaD,IbetaA,IbetaD,regionF) The input of this function contains all output and some input parameters of precalculationTimeBin. Its output, E and J, are the calculated FRET efficiency trace and corresponding fisher information trace. Both of them are column vectors with FRET region length (regionF(end)-regionF(1)+1).

In line 133-135 of MakeNonequilibriumHMMPoissonData.m, we construct an all-0 matrix Savedata with the same size of the input parameter s, which records two-channel intensity-time trajectories. And we assign the output parameters E and J to the first and second columns at row number from regionF(1) to regionF(end) of Savedata, respectively (The fisher information trace J is not used in WAVE analysis, the second column of Savedata can be discarded). Store Savedata in corresponding "... Efficiency.txt" file.

Through the above process, you can build a standard folder for WAVE analysis.

In our TAR-DNA hairpin folding experiment, we collected two sets of data for analysis. The first set corresponds to the FRET trajectories before condition change, providing a baseline measurement. The second set corresponds to the FRET trajectories collected immediately after the condition change. We analyzed these two data sets together using WAVE. By naming the files in two data sets differently, we ensured that the first 300+ files recorded by txt_path_list in line 54 of TestNonequilibrium3.m belong to the first data set. These files only contribute to “First Step: trajectories fitting” and other calculations related to FRET trajectories before condition change. For the rest of files that belong to the second data set, we set Changeframe=1; to find their non-equilibrium transition positions.

tuffwave commented 6 months ago

Thanks a lot! Your kindly comment was helpful in solving this issue.