OpenMS / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
478 stars 313 forks source link

OpenMS 2.3 Workflows #2910

Closed enetz closed 6 years ago

enetz commented 6 years ago

I am currently updating our workflows with OpenMS 2.3 nodes.

Making OpenMS nodes using KNIME SDK and GenericWorkflowNodes by following our github wiki page works quite well.

Here are the remaining issues:

  1. XTandemAdapter produces files, that IDPosteriorErrorProbability can not handle. This basically breaks all the workflows, where XTandemAdapter is used (e.g. consensus peptide ID, label-free with protein quant). Here is a gist containing XTandem idXMLs, one that works with IDPEP and one that does not (different mzML inputs but equal XTandem parameters): https://gist.github.com/enetz/42cfb23caf1e6e5e36d5c19f7cf0b068

  2. Metabolite ID workflow: The PCA plot uses the R pacakge ggbiplot, which is unavailable for newer R versions, so it would be better to make the plot using a standard package such as ggplot2.

  3. QC metanodes and workflows: QCExtractor now returns more columns than previously in some cases. Affects the QCPrecursorReader helper node in the ID Ratio metanode in the workflows, but could also affect other nodes not used in the workflows? QCPrecursorReader expects two columns, mz and RT. Additional columns for charge, singnal-to-noise ratio and peak count are now returned by QCExtractor. I will look into this today.

  4. DNA heteroconjugate detection: is probably fine, but I have no input data to test it

  5. Some problems I have seem to be specific to the KNIME SDK 3.4, some nodes are missing and I can not find the packages in the update repositories, e.g. LIBSVM support for the Metabolomics2016_lipidomics workflow and Excel Writer nodes used in several workflows. I have found these in the KNIME 3.4 full installer (XLS Writer is now Excel Writer (XLS)), but there I can not use OpenMS 2.3 nodes to run these workflows. I will check out, if I can find everything using KNIME SDK 3.3 today.

  6. Metabolite_SpectralID: we do not have a minimal example for a demonstration (I do not have any data at all, so could not run this). A minimal mzML with a few metabolite MS2 spectra and a minimal spectral database with a few entries would be nice to have.

  7. There is also the issue of versioning the workflows, since the new ones will not be working with OpenMS 2.1 nodes anymore, so it should stay possible to download the old workflows.

Workflows that are working fine with OpenMS 2.3 nodes already: basic_peptide_identification fdr_peptide_identification lfq_and_peptide_identification lfq_complete_with_simple_analysis Metabolite_ID (except that one PCA plot with newer R versions) ParallelAndIterativeLoops OpenSWATH FFId_iPRG-2015

enetz commented 6 years ago

I solved number 3, the QCPrecursorReader node (thanks @jpfeuffer ), but another issue popped up in the Peptidogenic Featurespace Plot metanode, the R View (Table) does not work. There was a filepath issue at first: theoretical_masses_file<-gsub("file:/","", theoretical_masses_var) was used and turned file:/home/username... into home/username... for absolute filepaths on Linux. I guess it was written to work on Windows with paths like file:/C:\... ?

But the rest of the node does not work either, it seems to assume the existence of objects, that just are not there. I also have the same issue with the old workflow, so it is nothing new. Is it maybe only meant to run on windows?

jpfeuffer commented 6 years ago

Great thank you @enetz.

1) I will create an update site of our release candidate so more people can test it.

2) Yes, please change to more popular/default packages (contact Marc)

3) @mwalzer maybe knows more. Hmm it could be the transport of objects between R Workspace nodes in KNIME. You can share a Workspace between consecutive R Nodes in KNIME.

4) We would need to ask @hendrikweisser for input data but it probably won't be part of our tutorials soon.

5) Doesn't KNIME try to load missing plugins in your case? LibSVM could be located in the WEKA plugin. @FabianAicheler can you have a look at your workflow? There should also be a separate Excel support plugin for the Excel stuff.

6) Also secondary, since it is not yet part of our tutorials

7) Indeed. The social workflow repository (https://cibi.uni-konstanz.de/socialworkflows/index) supports versioning with its snapshot mechanism. I would say this is the way to go.

oliveralka commented 6 years ago

@enetz

  1. Please try following code for the PCA (R)
    
    #install.packages("devtools")
    #library(devtools)
    #install.packages("ggfortify")
    library(ggfortify)

knime.out <- knime.in all_cols <- colnames(knime.out) intensity_cols <- allcols[grepl("^intensity\d+" , all_cols, perl=TRUE)] data <- as.matrix(knime.out[,intensity_cols]) tdata <- as.data.frame(t(data))

pca <- prcomp(tdata, center=TRUE, scale.=TRUE) tdata["colour"] <- c(rep("Control",3), rep("Treated",3))

autoplot(pca, data = tdata, colour = 'colour', frame = T)

enetz commented 6 years ago

pca

this is how the PCA looks like with that code, is that fine?

jpfeuffer commented 6 years ago

Ok KNIME Updatesite should be accessible from https://abibuilder.informatik.uni-tuebingen.de/archive/openms/knime-plugin/updateSite/RC now. Just add it under Install New Software. I have no categories yet. Just choose GKN and OpenMS. Of course try the fresh basic KNIME 3.4 installer so you do not have a previous OpenMS version.

jpfeuffer commented 6 years ago

The PCA reminds me of previous plots with circles. For me it is fine.

jpfeuffer commented 6 years ago

Regarding XTandem: The E-Values look like the following. No wonder that IDPep cannot fit anything. Why are there so many 1e-15s? What changed? screenshot 2017-09-12 18 20 25

jpfeuffer commented 6 years ago

1e-15 seems like a hard cutoff for the minimum of the XTandem E-Values (or one of our readers). But it is not really close to min positive float or something. ?

timosachsenberg commented 6 years ago

@hendrikweisser do you have any idea why this could happen? Also, there are some changes like https://github.com/OpenMS/OpenMS/commit/7b3f432aa421230f309b28393da1b388246e7ba8?w=1#diff-39d8c25dca3b78012768d67d4e90a7dfL76 that could be related

oliveralka commented 6 years ago

1) Tested the Metabolite ID Workflow and ran into some issues:

Missing Value Node gives Warning (which seems to be Ok in this case): WARN Missing Value 0:226:143 The current settings use missing value handling methods that cannot be represented in PMML 4.2

Error with R VIEW (Table) Node - this is a KNIME issue

ERROR R View (Table)       0:145      Execute failed: R evaluation failed.: "knime.tmp.ret<-NULL;printError<-function(e) message(paste('Error:',conditionMessage(e)));for(exp in tryCatch(parse(text=knime.tmp.script),error=printError)){tryCatch(knime.tmp.ret<-withVisible(eval(exp)),error=printError)
if(!is.null(knime.tmp.ret)) {if(knime.tmp.ret$visible) print(knime.tmp.ret$value)}};rm(knime.tmp.script,exp,printError);knime.tmp.ret$value"

@enetz Have you had similar issues?

2) Tested the Metabolite_SpectralID Error: File not found (the file 'CHEMISTRY/MetaboliteSpectralDB.mzML' could not be found) Can be solved with new input_node with DB (see data from boston2017.zip)

MacOs Sierra 10.12.6, KNIME 3.4.1 basic

oliveralka commented 6 years ago

@timosachsenberg basic_peptide_identification works for me without any problems. MacOs Sierra 10.12.6, KNIME 3.4.1 basic

edit: XTandemAdapter works here as well

enetz commented 6 years ago

@oliveralka that Error in the R View node looks very similar to what I have in an R View node in the QC workflow, although it works for me in Metabolite ID. You can try to look into the stderr output of the R View node, that might give some hints about what is wrong in the R code. The KNIME console seems to only write these errors from KNIMEs point of view (just says R Evaluation failed), but if you look into the nodes stderr or run the R code using the Eval Script button in the R snippet configuration, it should also return the error output from R.

oliveralka commented 6 years ago

@enetz There may be an issue with the package "Rserve" since the code works with my current R installation - I will try to update and install Rserve, since it needs a Version of 3.4.0 of the libR.dylib. Rserve seems to be essential for KNIME

2017-09-13 11:19:54,544 : DEBUG : KNIME-Worker-78 : RController : R View (Table) : 0:145 : An attempt (2/5) to connect to Rserve failed.
org.rosuda.REngine.Rserve.RserveException: Cannot connect: Connection refused

It seems in the end my R installation was buggy. It works with R 3.4.1 with XQuartz The essential packages for the R KNIME Nodes are 'Rserve' & 'Cairo'.

timosachsenberg commented 6 years ago

Working on Windows 7 64bit: basic_peptide_identification fdr_peptide_identification lfq_and_peptide_identification

lukaszimmermann commented 6 years ago

For the workflows, I guess the decoy_string parameter in PeptideIndexer has to be set to DECOY instead of the previously used REV.

jpfeuffer commented 6 years ago

Yep. just recently changed with https://github.com/OpenMS/Tutorials/issues/76

jpfeuffer commented 6 years ago

done for lfq_protein and consensusID

timosachsenberg commented 6 years ago

dr_peptide_identification lfq_and_peptideidentification Should also be updated by now I changed the string to DECOY