RNA-FRETools / MASH-FRET

MATLAB package to analyze single-molecule FRET data
https://rna-fretools.github.io/MASH-FRET/
GNU General Public License v3.0
8 stars 2 forks source link

"Estimate Rate Constants" generating a loop and not finishing #113

Closed snguyen49 closed 9 months ago

snguyen49 commented 9 months ago

Hi,

Whenever I try to estimate rate constants, the generation turns into a loop and does not stop unless I close the entire program.

Many Thanks, Sydney Nguyen

Email: snguyen49@student.gsu.edu

mca-sh commented 9 months ago

Hi Sydney,

Yes this is a known problem: the scripts that infer transition rate constants or perform ML-DPH are written in C and can not be stopped via MATLAB (this is stated in the remarks of the online doc: https://rna-fretools.github.io/MASH-FRET/transition-analysis/workflow.html#remarks). MATLAB needs to be forced to close.

In general, the larger your system is, the longer the inference will take and this is something I can't compress.

I hope this could clarify your concern. Best, Mélodie

snguyen49 commented 9 months ago

Hi again,

The system was set to have 100 molecules for 1000 frames with no change in number of states or rate constants. Does this count as a large system? If so, how long would the estimated calculation time be?

Many Thanks, Sydney Nguyen

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 4:10:12 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

Hi Sydney,

Yes this is a known problem: the scripts that infer transition rate constants or perform ML-DPH are written in C and can not be stopped via MATLAB (this is stated in the remarks of the online doc: https://rna-fretools.github.io/MASH-FRET/transition-analysis/workflow.html#remarks). MATLAB needs to be forced to close.

In general, the larger your system is, the longer the inference will take and this is something I can't compress.

I hope this could clarify your concern. Best, Mélodie

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1886679176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBBACMMMWFX4TOUKE3TYN6T7JAVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGY3TSMJXGY. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

Hi Sydney,

Your data set is quite small, but what about the dimension of the model you are trying to fit? How many states are there? The larger the transition rate matrix, the longer it will take.

Also, sometimes the algorithm needs a long time to converge when the complexity of the transition matrix is too high for the actual data.

To follow the convergence, you can see in the command window the parameter d and dL. It will stop when d becomes lower than 1E-8 or dL lower than 1E-6.

Fast convergence happens in about ten minutes, slow convergence in hours.

Best, Mélodie

snguyen49 commented 9 months ago

I had set the number of states to just 2 to test out the feature. If it takes at minimum 10 minutes, then I must have not waited long enough as I had thought there was something wrong.

Sydney Nguyen

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 10:03:49 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

Hi Sydney,

Your data set is quite small, but what about the dimension of the model you are trying to fit? How many states are there? The larger the transition rate matrix, the longer it will take.

Also, sometimes the algorithm needs a long time to converge when the complexity of the transition matrix is too high for the actual data.

To follow the convergence, you can see in the command window the parameter d and dL. It will stop when d becomes lower than 1E-8 or dL lower than 1E-6.

Fast convergence happens in about ten minutes, slow convergence in hours.

Best, Mélodie

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1887371389, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBECX7W7EXTNYFF7UC3YN75NLAVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGM3TCMZYHE. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

I see, this is a small system, this should not take long. Have a look at the progress in the command window when the process is running.

snguyen49 commented 9 months ago

When I took a look at the progress, it looked as if it was rerunning some of the same calculations. The process command also continuously stated that it was restarting the calculations.

Sydney Nguyen

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 10:07:38 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

I see, this is a small system, this should not take long. Have a look at the progress in the command window when the process is running.

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1887378682, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBGZVTJ3CTU7YYYOFMDYN753VAVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGM3TQNRYGI. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

Yes indeed, the initial transition probabilities are initialized multiple times (five if you did not change the parameters) and the inference yielding the maximum likelihood is selected as the best "fit". If this best "fit" transition probability matrix contains very low probabilities, these transition will be forbidden and the calculations are run again until no probability is neglectable.

snguyen49 commented 9 months ago

I see. The transition rate constants were all set to 0.1. Does that mean the transition probabilities will be lower? If so, does that mean there is nothing wrong with the process only that I should just let it run until it is done?

Sydney Nguyen

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 10:35:34 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

Yes indeed, the initial transition probabilities are initialized multiple times (five if you did not change the parameters) and the inference yielding the maximum likelihood is selected as the best "fit". If this best "fit" transition probability matrix contains very low probabilities, these transition will be forbidden and the calculations are run again until no probability is neglectable.

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1887430481, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBHIZSVVZGPACENDCQTYOABENAVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGQZTANBYGE. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

Unfortunately you can't be certain to recover the proper transition probabilities as multiple different matrices will yield the exact same data. The algorithm gives the least complex estimate of the transition probability matrix that would render the data.

And yes, there seems to be nothing wrong with the process, it must converge at one point. But I don't exclude a bug of course.

snguyen49 commented 9 months ago

Ah okay. Should I just run the program for quite a while then? If it does not work, is there anything I should send to you to analyze?

Many thanks Sydney Nguyen

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 10:47:41 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

Unfortunately you can't be certain to recover the proper transition probabilities as multiple different matrices will yield the exact same data. The algorithm gives the least complex estimate of the transition probability matrix that would render the data.

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1887452359, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBCA5KUMMNT5UQWBURTYOACR3AVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGQ2TEMZVHE. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

If it is taking that long for a two-state system, it means that it tries to fit a model that is way too complex for what it is. I guess you ran ML-DPH to find the number of degenerate states, right? What is the value of the parameter D that it found for both states? I am pretty sure this value was overestimated by ML-DPH.

I noticed that overestimation happens when the dwell time distribution shows only one decay. I will need to find a way to correct this sensitivity.

mca-sh commented 9 months ago

While I am trying a workaround, and as you are working with simulated data, you can set the parameter bin in panel Kinetic model to 1 instead of 10. I tried it and it worked, though it will be a bit longer to find the number of states.

snguyen49 commented 9 months ago

Okay that's perfectly fine with me. Thank you so much!

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mélodie Hadzic @.> Sent: Thursday, January 11, 2024 11:38:46 AM To: RNA-FRETools/MASH-FRET @.> Cc: Sydney Nguyen @.>; Author @.> Subject: Re: [RNA-FRETools/MASH-FRET] "Estimate Rate Constants" generating a loop and not finishing (Issue #113)

While I am trying a workaround, and as you are working with simulated data, you can set the parameter bin in panel Kinetic model to 1 instead of 10. I tried it and it worked, though it will be a bit longer to find the number of states.

— Reply to this email directly, view it on GitHubhttps://github.com/RNA-FRETools/MASH-FRET/issues/113#issuecomment-1887546083, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCCRYBFLABTLYYLBGEJVB6TYOAIRNAVCNFSM6AAAAABBVB7ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGU2DMMBYGM. You are receiving this because you authored the thread.Message ID: @.***>

CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

mca-sh commented 9 months ago

Hi Sydney, After making tests, it is preferable to use bin=1 as long as you are using simulated data. Histogram re-binning has a detrimental effect of the performances of ML-DPH. Use bin=10 only for experimental data.

I've just added an additional test that favors underestimation of the number of degenerate states in case the dwelltime histograms are overbinned (like in your case), so that you don't accidently get caught in an endless estimation of rate constants. You can find the corrected version in the main Github page, branch master (or here: https://codeload.github.com/RNA-FRETools/MASH-FRET/zip/refs/heads/master).

Thanks again for reporting the issue, Mélodie