fosterlab / PrInCE-Matlab

Bioinformatics pipeline for predicting protein interactomes via co-elution
GNU General Public License v3.0
4 stars 2 forks source link

PrInCE not completing analysis of test data set #2

Closed kevinkovalchik closed 5 years ago

kevinkovalchik commented 6 years ago

Hi Greg,

This is Kevin in Gregg Morin's lab. I'm running into an error when using PrInCE. I'm using Matlab R2018a and the current Master branch of PrInCE. The error is happening when using our own data as well as the test data set distributed with the code. Here is the output from the console:

 Continuing without a major protein groups file...

ans =

    'Detected 55 FRACTIONS and 3 REPLICATES in condition files. If that is not correct, check that condition files are correctly formatted.'

GaussBuild.m

    0. Initialize  ...  0.00 seconds

    1. Read input data  ...  8.05 seconds

    2. Clean the chromatograms  ...  1.34 seconds

    3. Fit 1-5 Gaussians on each cleaned chromatogram  ...  419.92 seconds

    4. Write outputGauss_Build: writeOutput: SEC fitting failed. Size of Complex will be zero.
  ...  0.30 seconds

    5. Make figuresError using ylim (line 31)
Limits must be a 2-element vector of increasing numeric values.

Error in makeFigures_gaussbuild (line 133)
ylim([0 max(h1(:))*1.05])

Error in GaussBuild (line 268)
  makeFigures_gaussbuild

Error in prince (line 52)
GaussBuild

Also here is the logfile: logfile.txt

GregStacey commented 6 years ago

Hi Kevin,

We’re about to update to a newer version. This is probably an issue with the newer matlab version.

Which version are you running - the standalone executable or the matlab code? If the latter, a quick fix is to comment out makeFigures_gaussbuild in GaussBuild.m. I’m away from my computer, but I will send the line number ASAP.

Thanks for your interest and sorry for the error!

-Greg

On Fri, Jun 1, 2018 at 3:32 PM kevinkovalchik notifications@github.com wrote:

Hi Greg,

This is Kevin in Gregg Morin's lab. I'm running into an error when using PrInCE. I'm using Matlab R2018a and the current Master branch of PrInCE. The error is happening when using our own data as well as the test data set distributed with the code. Here is the output from the console:

Continuing without a major protein groups file...

ans =

'Detected 55 FRACTIONS and 3 REPLICATES in condition files. If that is not correct, check that condition files are correctly formatted.'

GaussBuild.m

0. Initialize  ...  0.00 seconds

1. Read input data  ...  8.05 seconds

2. Clean the chromatograms  ...  1.34 seconds

3. Fit 1-5 Gaussians on each cleaned chromatogram  ...  419.92 seconds

4. Write outputGauss_Build: writeOutput: SEC fitting failed. Size of Complex will be zero.

... 0.30 seconds

5. Make figuresError using ylim (line 31)

Limits must be a 2-element vector of increasing numeric values.

Error in makeFigures_gaussbuild (line 133) ylim([0 max(h1(:))*1.05])

Error in GaussBuild (line 268) makeFigures_gaussbuild

Error in prince (line 52) GaussBuild

Also here is the logfile: logfile.txt https://github.com/fosterlab/PrInCE/files/2064197/logfile.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobehbV2bVx0TJBbg8EX2WMk6l4xaClks5t4bLugaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Hi Greg, Thanks for the fast reply! I'm using the matlab code, so I'll comment out makeFigure_gaussbuild. It gives the line number in the error, so I should be able to track it down.

GregStacey commented 6 years ago

Hi Kevin,

Sounds good! Don’t hesitate to let me know if there are any other issues.

If you’re not using mammalian samples, meaning you can’t use CORUM, the gold standard can sometimes cause headaches. I’m happy to give you my 2 cents with that if it’s gelpful.

-greg

On Fri, Jun 1, 2018 at 4:04 PM kevinkovalchik notifications@github.com wrote:

Hi Greg, Thanks for the fast reply! I'm using the matlab code, so I'll comment out makeFigure_gaussbuild. It gives the line number in the error, so I should be able to track it down.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-394021794, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobeknIkN0rKQtK5rflOnW84dlbjqJPks5t4bp6gaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Thanks, Greg. That fixed that problem, but now another error is occurring. Maybe this is due to using a newer Matlab version. I'll watch the repository and try again when the update comes out.

Alignment.m
* NB: User set Number of Replicates to 1. Skipping Alignment...
FoldChanges.m
* NB: User set number of SILAC ratios to 1. Skipping FoldChanges...
Interactions.m

    0. Initialize  ...  0.11 seconds

    Channel condition1
        Replicate 1
        1. Read input dataDot indexing is not supported for variables of this type.

Error in Interactions (line 195)
      if isfield(tmp.data,'Sheet1')

Error in prince (line 55)
Interactions
GregStacey commented 6 years ago

Ah, too bad! Are you able to send the logfile produced by this run?

On Fri, Jun 1, 2018 at 10:20 PM kevinkovalchik notifications@github.com wrote:

Thanks, Greg. That fixed that problem, but now another error is occurring. Maybe this is due to using a newer Matlab version. I'll watch the repository and try again when the update comes out.

Alignment.m

  • NB: User set Number of Replicates to 1. Skipping Alignment... FoldChanges.m
  • NB: User set number of SILAC ratios to 1. Skipping FoldChanges... Interactions.m

    1. Initialize ... 0.11 seconds

    Channel condition1 Replicate 1

    1. Read input dataDot indexing is not supported for variables of this type.

Error in Interactions (line 195) if isfield(tmp.data,'Sheet1')

Error in prince (line 55) Interactions

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-394057011, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobenVSC0lxpZnjRKXXOjczxgVSSkl1ks5t4hKpgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Yup, here it is. logfile.txt I didn't clean it out before this run so there at least one other run before.

kevinkovalchik commented 6 years ago

Hi Greg. Are the updates you mentioned finished? I have tried using the current master branch but am still having problems using 2018a, with my own data and with the sample data in the repository.

logfile.txt

GregStacey commented 6 years ago

Hi Kevin,

No, not quite yet! We're still in the process of updating the code. I can send you an email when we've finished.

Is the attached logfile.txt for the sample data or your data?

Thanks for the feedback.

-greg

On Tue, Jul 24, 2018 at 10:32 AM, kevinkovalchik notifications@github.com wrote:

Hi Greg. Are the updates you mentioned finished? I have tried using the current master branch but am still having problems using 2018a, with my own data and with the sample data in the repository.

logfile.txt https://github.com/fosterlab/PrInCE/files/2224698/logfile.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-407488399, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobeknIPU4_R72_7vqMPdGUjjCF7W9Zks5uJ1oggaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Thanks! I think the last set of data I ran was the sample data.

GregStacey commented 6 years ago

Hi Kevin,

I couldn't reproduce the error and PrInCE seems to be running fine on the test data on our machines. I added extra error handling, though, so the current master branch should run for you now.

Let me know if you continue to have problems. Worst case I can run your data over here and send you the output! We shouldn't need to do that, but I'm happy to work with you to get your data analyzed!

-greg

On Wed, Jul 25, 2018 at 10:07 AM, kevinkovalchik notifications@github.com wrote:

Thanks! I think the last set of data I ran was the sample data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-407827148, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobei4WPI6YzdBSiXAA8ZVhPQS9G52Fks5uKKXogaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

This still isn't working for me... It seems the Gaussian fitting is failing, so there is no data for it to work with further down the line. I am on UBC campus today. If you are around can I meet you sometime if you have time. Might be easier to discuss the problem in person. Or if another day works better that is fine with me.

GregStacey commented 6 years ago

Yes, come by! I'm in the NCE building (next to Michael Smith Labs), room

  1. Any time after 12 works, but maybe around 1 or 2?

On Wed, Aug 1, 2018 at 9:47 AM, kevinkovalchik notifications@github.com wrote:

This still isn't working for me... It seems the Gaussian fitting is failing, so there is no data for it to work with further down the line. I am on UBC campus today. If you are around can I meet you sometime if you have time. Might be easier to discuss the problem in person.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-409642354, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobeiReNLZEdG0MjGG804ComRSJ0Trwks5uMduigaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Sure, I'll come over around 1:00. Thanks!

kevinkovalchik commented 6 years ago

Thanks for meeting today. Quick question: is the fitting in a parfor loop? Because if it is, it is still only using one processor.

GregStacey commented 6 years ago

No problem! I appreciate you helping me test prince.

The fitting is in the parfor loop. Do you mind stopping prince and checking a few things?

  1. Do you have the parallel computing toolbox installed?
  2. Type parpool in the Matlab command line. What's the output?
  3. Type parcluster('local') in the command line. What's the output?

Thanks again. It will run fine with only one processor, but the curve fitting will take a while.

On Wed, Aug 1, 2018 at 1:52 PM, kevinkovalchik notifications@github.com wrote:

Thanks for meeting today. Quick question: is the fitting in a parfor loop? Because if it is, it is still only using one processor.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-409718853, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobepDFkR9ZgAyk2td_KAzi7oLk5Owgks5uMhUYgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Ah! I'm sure I don't have the parallel computing toolbox, so that explains it. So many toolboxes!

kevinkovalchik commented 6 years ago

This is largely aesthetic, but something else that would be nice would be some sort of progress indicator. Obviously it is chugging away, but it is nice to see either a percent complete or something like X out of XX complete. It could just be printed at the end of each line as they come up.

GregStacey commented 6 years ago

Well it'd help if someone told you which toolboxes you need! :)

Agreed on the progress indicator. I'll add that.

On Wed, Aug 1, 2018 at 2:12 PM, kevinkovalchik notifications@github.com wrote:

This is largely aesthetic, but something else that would be nice would be some sort of progress indicator. Obviously it is chugging away, but it is nice to see either a percent complete or something like X out of XX complete. It could just be printed at the end of each line as they come up.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-409725146, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobejDu4hdXfo4yh9nsKfVntbIeTv_qks5uMhnPgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

What all is happening in step 5. Make figures? It is generating figures for each set of protein data, but it is taking ~30 seconds per protein.

GregStacey commented 6 years ago

It's making a few diagnostic figures and then plotting chromatograms for each protein. It should take a second or two per protein. If it's taking 30 seconds that's way too slow.

One solution: skip these single-protein plots using the "Skip plots?" box in the GUI. GaussBuild.m and FoldChanges.m make single-protein plots, so I'd skip them both.

I think there are known issues with slow plotting in Matlab, e.g. https://www.mathworks.com/matlabcentral/answers/306478-plotting-in-matlab-extremely-slow. You could also try the solution there (opengl software).

On Wed, Aug 1, 2018 at 4:44 PM, kevinkovalchik notifications@github.com wrote:

What all is happening in step 5. Make figures? It is generating figures for each set of protein data, but it is taking ~30 seconds per protein.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-409760342, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobemHEDSuQ12MCUnYGrQtw3LqU9677ks5uMj1vgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Well, we're not all the way there yet but getting closer! I closed the plotting window, which just caused it to skip the rest of the plotting, which was nice. But then it crashed in the next step when trying to load data from replicate 2, I assume because there is no replicate 2 in my data...

Here's a picture of the command window after the crash, and I'll attache the log file too. image

logfile.txt

GregStacey commented 6 years ago

Okay, at least we're getting somewhere!

It should detect the fact there's only one replicate (and I see in the logfile that it's only seeing one), but it sometimes get confused if multiple datasets were analyzed in the same folder. Did you analyze the test data first, and then analyze your data? If that's the case, I'd delete the Output/ folder and starting the analysis again. I'll fix this behaviour.

To speed things up, click the "Skip plots?" radio buttons in the GUI (or just close the plotting windows like you did!).

On Fri, Aug 3, 2018 at 8:44 AM, kevinkovalchik notifications@github.com wrote:

Well, we're not all the way there yet but getting closer! I closed the plotting window, which just caused it to skip the rest of the plotting, which was nice. But then it crashed in the next step when trying to load data from replicate 2, I assume because there is no replicate 2 in my data...

Here's a picture of the command window after the crash, and I'll attache the log file too. [image: image] https://user-images.githubusercontent.com/16054736/43652150-3cf88a2a-96f9-11e8-9fff-57bff90d7f87.png

logfile.txt https://github.com/fosterlab/PrInCE/files/2257941/logfile.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-410294056, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobenqgDDELM7NO2tfXSs-DNkuoBX3Oks5uNG_ZgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

Alright, that got us farther! Now this came up:

image

And here's the log:

logfile.txt

I think the problem is that IBest is not defined outside the if block starting at line 534 of Complexes.m.

Possible fix, starting with line 530 (though I am somewhat guessing what is happening here. As I mentioned, my matlab experience is zero):

if isempty(Iopt) Iopt = 1; else Iopt = find(best_list.opt == best_list.opt(Iopt)); if sum(Iopt)>1 Ilongest = best_list.NN(Iopt) == nanmax(best_list.NN(Iopt)); Ibest = Iopt(find(Iopt(Ilongest),1,'first')); else Ibest = 1 end end

kevinkovalchik commented 6 years ago

Well... that code isn't formatted quite how I envisioned. I added an else statement after the if at 534 in which Ibest is defined as 1.

GregStacey commented 6 years ago

Thanks Kevin - I'll get to this today!

I see in the log file that few interactions were predicted and with a low precision. Can you remind me where your data's coming from and what gold standard you're using? Is it mammalian (human?), and are you using the default allComplexes.txt file that comes with PrInCE, i.e. CORUM?

On Tue, Aug 7, 2018 at 1:42 PM, kevinkovalchik notifications@github.com wrote:

Well... that code isn't formatted quite how I envisioned. I added an else statement after the if at 534 in which Ibest is defined as 1.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fosterlab/PrInCE/issues/2#issuecomment-411195210, or mute the thread https://github.com/notifications/unsubscribe-auth/ANobev8C7LqVVMGvByJB4ZJv8y3jo2W3ks5uOfuZgaJpZM4UXV-_ .

kevinkovalchik commented 6 years ago

The data is from a human cell line. It might be a nuclear extract, I don't remember at the moment. I didn't use the default allComplexes file, but I tried to get the latest from CORUM.

Thanks for all your help.

GregStacey commented 6 years ago

I added a fix to Complexes.m, and the code should (finally!) run all the way through with your data.

For the low number of interactions, it shouldn't be an issue with the reference set, since you're using CORUM + human cells.

The next thing I'd check is that the co-fractionation profiles have i) well defined elution peaks that ii) span the elution. You can use the figures in Output/Figures/GaussBuild/ to check these. The Chromatograms/ folder has individual plots of all co-fractionation profiles (unless this step was skipped); easy to see the width/distribution of peaks across proteins. Chromatograms_clean_condition1.png is a heat map of the sorted co-fractionation profiles; again, you can see the width and distribution of elution peaks. Hist_GaussianParameters.png shows the fitted Gaussian parameters; the interesting ones are C and W, i.e. mean and standard deviation. C should be approximately uniform, and W should be be heavily biased toward low numbers. This shows the peaks are narrow and span the elution. Finally, Hist_R2.png is a histogram of R^2 between the fitted models and the cleaned co-fractionation profiles. The mean R^2 should be around .95, and if it's not that could signal problems with the data.

If you send me your email I can send some example plots from successful data. (richard.greg.stacey@gmail.com)

jdrudolph commented 6 years ago

Hi

I'm also having trouble running the example data on my windows machine with Matlab2016b. Attached is the output of running the code from the master branch in matlab by entering prince.

Any ideas, how to make the code run through? Thanks!

ans =

Detected 55 FRACTIONS and 3 REPLICATES in condition files. If that is not correct, check that condition files are correctly formatted.

Warning: The following file appears to contain badly formatted protein IDs:
 D:\Documents\scratch\PrInCE//Input/Major_protein_groups.csv 
> In standardinput (line 134)
  In prince (line 42) 
GaussBuild.m

    0. Initialize  ...  0.00 seconds

    1. Read input data  ...  13.10 seconds

    2. Clean the chromatograms  ...  2.23 seconds

    3. Fit 1-5 Gaussians on each cleaned chromatogram  ...  550.62 seconds

    4. Write outputGauss_Build: writeOutput: SEC fitting failed. Size of Complex will be zero.
  ...  0.08 seconds

    5. Make figuresWarning: Failed to make heatmap figures. 
> In makeFigures_gaussbuild (line 114)
  In GaussBuild (line 274)
  In prince (line 52) 
Warning: Failed to plot Hist_NumberOfGaussians. 
> In makeFigures_gaussbuild (line 142)
  In GaussBuild (line 274)
  In prince (line 52) 
  ...  2.40 seconds
Alignment.m

    0. Initialize  ...  0.01 seconds
    1. Read input  ...  14.40 seconds
    2. Find the best replicates to align to  ...  0.02 seconds
    3. Calculate best fit lines for adjustment  ...  0.01 seconds
    4. Using fitted curves, adjust replicate data  ...  1.28 seconds
    5. Write output  ...  1.32 seconds
    6. Make figures  ...  3.16 seconds
FoldChanges.m

    0. Initialize  ...  0.01 seconds
    1. Read inputError using reshape
Size arguments must be real integers.

Error in FoldChanges (line 268)
    GaussData{ii}=reshape(tmp2{:},10,(tmp3(1)/10))';

Error in prince (line 54)
FoldChanges
GregStacey commented 6 years ago

Hi Jan - Unfortunately I can't reproduce your error, but it looks like there's a problem with fitting the mixed Gaussian models. Can you do the following?

  1. Check you have the parallel computing and curve fitting toolboxes installed. Run checktoolbox from the command line. Does it return any errors?

  2. In Output/tmp/, do you see files _condition1_Combined_OutputGausrep1.csv, _condition1_Combined_OutputGausrep2.csv, etc.? Can you confirm they're empty?

jdrudolph commented 6 years ago

Hi Greg - thanks for your suggestions.

  1. The checktoolbox command returns no errors. I also attach the output of ver.

    >> checktoolbox
    
    ans =
    
         []
    
    >> ver
    ----------------------------------------------------------------------------------------------------
    MATLAB Version: 9.1.0.441655 (R2016b)
    MATLAB License Number: 40523909
    Operating System: Microsoft Windows 10 Pro Version 10.0 (Build 17134)
    Java Version: Java 1.7.0_60-b19 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
    ----------------------------------------------------------------------------------------------------
    MATLAB                                                Version 9.1         (R2016b)
    Curve Fitting Toolbox                                 Version 3.5.4       (R2016b)
    MATLAB Compiler                                       Version 6.3         (R2016b)
    Optimization Toolbox                                  Version 7.5         (R2016b)
    Statistics and Machine Learning Toolbox               Version 11.0        (R2016b)
  2. Both files in Output/tmp exist but contain only the header row.

kevinkovalchik commented 6 years ago

Hi Gregg,

Sorry for my slow response here! I've been on vacation and haven't had a chance to check out the new updates you made. I will test it in the next couple days and let you know how it goes.

In regards to Jan's problem, I recall I also needed to install the Signal Processing Toolbox and it looks like maybe that one isn't installed. Don't know if that's the issue, but something to try!

Kevin

GregStacey commented 6 years ago

@jdrudolph Ah yup! If you install the Signal processing, Parallel computing, and System identification toolboxes, things should work. I'll change how I'm testing for their installation. (You should have been prompted to install them, which obviously didn't happen.)

GregStacey commented 6 years ago

@kevinkovalchik I don't think I've made major updates, so using the newest version might not get you any more interactions. It's likely an issue with data quality, reference set, or both. Send me an email (richard.greg.stacey@gmail.com) and we can talk about what to do!