Closed hechth closed 1 year ago
Hi, I am using flow which you mentioned. But the function 'rc.ramclustr' is showing the following error-
RC_F <- rc.ramclustr(ramclustObj = RC_E, st = NULL,
- sr = NULL, maxt = NULL, deepSplit = FALSE, blocksize = 2000,
- mult = 5, hmax = NULL, collapse = TRUE,
- minModuleSize = 2, linkage = "average",
- cor.method = "pearson", rt.only.low.n = TRUE, fftempdir = NULL) calculating ramclustR similarity: nblocks = 3 1 2 3 RAMClust feature similarity matrix calculated and stored: RAMClust distances converted to distance object fastcluster based clustering complete dynamicTreeCut based pruning complete RAMClust has condensed 2652 features into 444 spectra collapsing feature into spectral signal intensities Error in rc.ramclustr(ramclustObj = RC_E, st = NULL, sr = NULL, maxt = NULL, : this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names
If I use the function 'ramclustr', it is asking for xcms object. If I give xcms object, then it is telling me to do the filtering before clustering. Can you pleaseeeeee help me out!!!!! I am struggling a lot! Any help would be much appreciated.
Thank you!
@arpita-007 I think this is an easy fix. It is asking you for phenotype data, which must be missing. you can add phenotype/experimental design data using the defineExperiment function, then feeding that in as an option in the rc.get.xcms.data() function with the ExpDes option.
pheno <- RAMClustR::defineExperiment() RC <- RAMClustR::rc.get.xcms.data( ExpDes = pheno) RC <- RAMClustR::rc.ramclustr(ramclustObj = RC)
@cbroeckl Thank you so much for responding and for the guidance. Your suggestion worked. I could do the clustering after subtracting blank and normalization. But now I am getting an error in importing the msfinder.formulas.
import.msfinder.formulas(ramclustObj = RC_F, msp.dir = NULL) Press 1 for .mat or 2 for .msp to continue2 Error in do[[i]] : subscript out of bounds import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = NULL) Press 1 for .mat or 2 for .msp to continue1 Error in do[[i]] : subscript out of bounds import.msfinder.formulas(ramclustObj = RC_F) Press 1 for .mat or 2 for .msp to continue2 Error in do[[i]] : subscript out of bounds import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = "C:/Users/DR Pallavi Lab/Documents/spectra/ms/spectra/msp") Press 1 for .mat or 2 for .msp to continue 2 Error in do[[i]] : subscript out of bounds
Also while exporting the data with exportDataset() function I am getting this-
exportDataset( ramclustObj = RC_G, which.data = "SpecAbund", label.by = "ann", appendFactors = TRUE) Error in which(row.names(ramclustObj$ExpDes$design) == "fact1name"):(which(row.names(ramclustObj$ExpDes$design) == : argument of length 0
Thank you in advance!!
Did you run MSFinder? You need to run this program manually using the exported .mat files as input, then run import.msfinder.formulas. If MSFinder ran, it should have written directories for each compound which contain formula results which ramclustR imports. At this time there are no R-based tools which perform a comparable set up steps, so we are reliant on running external programs (MSFinder or Sirius are the ones i have used and have import functions for, currently) for the actual MS/MS spectrum annotation.
Thank you @cbroeckl!! I will do as you suggested.
Hi @cbroeckl
I was using the same flow again for a different experiment and the same error appeared. I did as you suggested but it is not working.
pheno <- RAMClustR::defineExperiment() RC <- RAMClustR::rc.get.xcms.data(xcmsObj = fill_GRP,
- taglocation = "pathGRP",
- MStag = NULL,
- MSMStag = NULL,
- ExpDes = pheno,
- mzdec = 3,
- ensure.no.na = TRUE) RC_B <- rc.feature.replace.na(
- ramclustObj = RC,
- replace.int = 0.1,
- replace.noise = 0.1,
- replace.zero = TRUE) replaced 445885 of 1032504 total feature values ( 43 % ) RC_C <- rc.feature.filter.blanks(ramclustObj = RC_B,
- qc.tag = c("QC", "sample.names.sample_group"),
- blank.tag = c("Blank", "sample.names.sample_group"),
- sn = 3, remove.blanks = TRUE) 41.1% of features move forward df phenoData ma MSdata Features which failed to demonstrate signal intensity of at least 3 fold greater in QC samples than in blanks were removed from the feature dataset. 25336 of 43021 features were removed. RC_D <- rc.feature.normalize.tic(ramclustObj = RC_C) RC_E <- rc.feature.filter.cv(ramclustObj = RC_D, qc.tag = c("QC", "sample.names.sample_group"),
- max.cv = 0.3) MSdata : 5477 passed the CV filter Features were filtered based on their qc sample CV values. Only features with CV vaules less than or equal to 0.3 in MSdata set were retained. 12208 of 17685 features were removed. RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E) calculating ramclustR similarity: nblocks = 6 1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored: RAMClust distances converted to distance object fastcluster based clustering complete dynamicTreeCut based pruning complete RAMClust has condensed 5477 features into 851 spectra collapsing feature into spectral signal intensities Error in RAMClustR::rc.ramclustr(ramclustObj = RC_E) : this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names
I created an experiment design. You were telling about phenotype data. If I am not wrong, phenotype data and phenoData (shown in error) are different. I am not sure what to do in this case.
Thank you
@arpita-007 - what does this show:
RC_F$ExpDes
RC_F$phenoData
fill_GRP@phenoData
the @phenoData slot from the xcms object should be brought to the RAMClustR object - this error suggests that this isn't happening, at least not in the way i anticipated.
Then what can be done to bring the phenoData to the RAMClustR object?
show me the output of these:
head(RC_F$ExpDes)
head(RC_F$phenoData)
head(fill_GRP@phenoData)
RC_F is not yet created because of the error. Here is the RC_E:
head(RC_E$ExpDes) $design Value Description Experiment GRP experiment name, no spaces Species Homo sapiens species name Sample Serum sample type Contributor Arpita individual and/or organizational affiliation platform LC-MS GC-MS or LC-MS
$instrument
value
chrominst Dionex 3000
msinst Orbitrap fusion
column Acquity HSS T3
solvA Water
solvB Methanol
CE1 30 V
CE2
mstype Orbi
msmode Positive
ionization ESI
colgas Helium
msscanrange 50-1500 Da
conevolt 30 V
MSlevs 2
head(RC_E$phenoData) sample.names.sample_name sample.names.sample_group filenames 2 A2_QC1 QC A2_QC1.mzML 4 A4_A_1 Sample A4_A_1.mzML 5 A5_A_2 Sample A5_A_2.mzML 7 A7_C_1 Sample A7_C_1.mzML 8 A8_C_2 Sample A8_C_2.mzML 10 B1_D_1 Sample B1_D_1.mzML filepaths 2 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A2_QC1.mzML 4 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A4_A_1.mzML 5 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A5_A_2.mzML 7 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A7_C_1.mzML 8 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A8_C_2.mzML 10 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\B1_D_1.mzML head(fill_GRP@phenoData
- ) An object of class 'NAnnotatedDataFrame' rowNames: 1 2 ... 6 (6 total) varLabels: sample_name sample_group varMetadata: labelDescription Multiplexing: 1 - Single run
what does this return?
is.null(RC_E$phenoData$sample.names)
and this:
names(RC_E$phenoData)
is.null(RC_E:$phenoData$sample.names) Error: unexpected '$' in "is.null(RC_E:$"
names(RC_E$phenoData) [1] "sample.names.sample_name" "sample.names.sample_group" "filenames"
[4] "filepaths"
I tried this too:
is.null(RC_E:$phenoData$sample.names) Error: unexpected '$' in "is.null(RC_E:$" is.null(RC_E:$phenoData$sample.names.sample_name) Error: unexpected '$' in "is.null(RC_E:$"
i think the issue is that the first column of your RC_E$phenoData data frame is supposed to be 'sample.names' but for some reason is isn't. Try this:
names(RC_E$phenoData)[1] <- "sample.names" RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)
Resolved I guess!
names(RC_E$phenoData)[1] <- "sample.names" RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E) calculating ramclustR similarity: nblocks = 6 1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored: RAMClust distances converted to distance object fastcluster based clustering complete dynamicTreeCut based pruning complete RAMClust has condensed 5477 features into 854 spectra collapsing feature into spectral signal intensities RC_F
Call: fastcluster::hclust(d = tmp.ramclustObj, method = linkage)
Cluster method : average Distance : RAMClustR Number of objects: 5477
I am not sure why this happened - i will have to some more homework, but this gets you moving forward.
@cbroeckl Thanks a lot again :)
Sorry to bother you again, but can you please tell in rc.get.xcms.data(xcmsObj = fill_GDMHCP, taglocation = "phenoData[,1]", MStag = NULL, MSMStag = NULL, ExpDes = pheno, mzdec = 4, ensure.no.na = FALSE)
what file should be given in MStag?
Thanks
@arpita-007 The MStag
parameter is not a file - how do you indicate which files are MS1 and which are MS2? Or do only use MS1 data?
@hechth We do not have separate files for MS1 and MS2. We use single files for both. Though we have MS2 data written in mgf. format by XCMS, can we use that?
@arpita-007 the idea behind RAMClustR is to extract MS1 and MS2 info from the files individually and run XCMS on those and then in the peak alignment step to align the feature tables, representing MS1 and MS2 as different samples.
If you have MS2 data in mgf format from XCMS, can you check if the MS2 data is also contained in the XCMS object used in R?
@arpita-007 - if you have only MS1, if i recall you can just leave it as NULL and the processing will proceed appropriately. RAMClustR doesn't currently deal with DDA-like MS/MS data.
@hechth I could not locate the XCMS object containing the MS2 data. But as @cbroeckl suggested, I proceeded with MS1 only. Thanks to both of you for solving all my doubts and making it easier for me. Thank you :)
Hi, Can you please help me to understand this error? I am getting t his for a particular file only. I ran same code for 3 different mode files (RP pos, RP neg, HILIC pos) but I am seeing this error for my 4th file.
library(RAMClustR)
pheno <- RAMClustR::defineExperiment() path2 <- file.path("E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv") path2 [1] "E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv" RC_PHCN <- ramclustR(ms = path2,
- featdelim = "_",
- st = 5,
- ExpDes = pheno,
- sampNameCol = 1) organizing dataset normalizing dataset Calculating ramclustR similarity using 3 nblocks. 1 2 3 Error in ramclustObj[startv:stopv] <- column : replacement has length zero
@arpita-007 - can you send me the file you are using as input? cbroeckl at colostate dot edu.
PHCN file is giving error while PHCP processed successfully with same codes.
PHCN_input_clustering_after corr.csv PHCP_input_clustering_after corr.csv
@arpita-007 - i think this is a rare event coupled with imperfect code. the file that fails has exactly 2000 features, which happens to be what the default blocksize setting is. try setting the option in the ramclustr function: blocksize = 1200. i suspect it will run fine. let me know if this fixes it please!
@cbroeckl Yes, it fixed the issue. Thank you.
@cbroeckl thanks for the proposed solution - we will implement a bugfix for that!
@arpita-007 and @cbroeckl I think we can maybe close this issue as most things have been adressed and resolved?
I created issues for the things which still have to be taken care of.
Most other things are adressed in the open PR #39
@hechth Yes sure. Thank you!
The
ramclust.R
file contains a function covering the whole workflow, but therc.*.R
files actually contain the same functionality in multiple steps, which is more convenient to test and maintain.ramclust.R
with the respective sub-steps of the workflow