biodavidjm / artMS

Analytical R Tools for Mass Spectrometry
GNU General Public License v3.0
14 stars 7 forks source link

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated #184

Closed jfertaj closed 3 years ago

jfertaj commented 3 years ago

Dear David,

I have installed the new version of artMS that includes some nice features. However, I am having some issues when running analyses that were successful run with artMS 1.9.4.

I got an error during Msstats step after handling the fractions (no fractions enabled in my experiments).

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 7002761 rows; more than 584171 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
> 

When I run the same files with artMS 1.9.4 the analyses ends perfectly.

This is my yaml configuration file

files:
  evidence: evidence_LS.txt
  keys: keys_LS.txt
  contrasts: contrasts_LS.txt
  summary: summary_LS.txt
  output: results_LS/results_LS.txt
qc:
  basic: 0
  extended: 0
  extendedSummary: 0
data:
  enabled: 1
  silac:
    enabled: 0
  filters:
    enabled: 1
    contaminants: 1
    protein_groups: keep
    modifications: AB
  sample_plots: 1
msstats:
  enabled: 1
  msstats_input: ~
  profilePlots: none
  normalization_method: equalizeMedians
  normalization_reference: ~
  summaryMethod: TMP
  MBimpute: 1
  censoredInt: NA
  feature_subset: all
  n_top_feature: 3
  logTrans: 2
  remove_uninformative_feature_outlier: no
  min_feature_count: 2
  equalFeatureVar: yes
  remove50missing: no
  fix_missing: ~
  maxQuantileforCensored: 0.999
  use_log_file: no
  append: no
  log_file_path: ~
output_extras:
  enabled: 1
  annotate:
    enabled: 1
    species: HUMAN
  plots:
    volcano: 1
    heatmap: 1
    LFC: -0.58 0.58
    FDR: 0.05
    heatmap_cluster_cols: 0
    heatmap_display: log2FC

Any help would be appreciated Thanks

Juan

biodavidjm commented 3 years ago

Hi Juan, thanks for reporting this.

we would need a little bit more information to debug this issue.

Thanks!

hemingwang commented 3 years ago

Hi I meet the same question

my worng is

artMS: Relative Quantification using MSstats

Reading the configuration file LOADING DATA MERGING FILES CONVERT Intensity values < 1 to NA FILTERING -- Contaminants CON|REV removed -- Removing protein groups -- Use as Protein ID -- PROCESSING AB CONVERTING THE DATA TO MSSTATS FORMAT -- Selecting Sequence Type: MaxQuant 'Modified.sequence' column (+) column added (with value 1, MSstats requirement) -- Adding NA values for missing values (required by MSstats) -- Write out the MSstats input file (-mss.txt) RUNNING MSstats (it usually takes a 'long' time: please, be patient) -- Normalization method: equalizeMedians INFO [2021-08-10 00:23:44] Features with one or two measurements across runs are removed. INFO [2021-08-10 00:23:44] Fractionation handled. Error in vecseq(f, len, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 37881 rows; more than 4218 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

Thanks!

biodavidjm commented 3 years ago

Thanks,

To debug the issue, it is also needed the following information:

# R version
version

# artMS version
packageVersion("artMS")

Thanks

hemingwang commented 3 years ago

version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 1.0
year 2021
month 05
day 18
svn rev 80317
language R
version.string R version 4.1.0 (2021-05-18) nickname Camp Pontanezen

artMS version

packageVersion("artMS") [1] ‘1.10.2’

and the log is R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] artMS_1.10.2

loaded via a namespace (and not attached): [1] nlme_3.1-152 bitops_1.0-7 bit64_4.0.5 RColorBrewer_1.1-2 httr_1.4.2
[6] GenomeInfoDb_1.28.1 UpSetR_1.4.0 tools_4.1.0 backports_1.2.1 utf8_1.2.2
[11] R6_2.5.0 KernSmooth_2.23-20 lazyeval_0.2.2 DBI_1.1.1 BiocGenerics_0.38.0
[16] colorspace_2.0-2 ade4_1.7-17 tidyselect_1.1.1 gridExtra_2.3 bit_4.0.4
[21] compiler_4.1.0 VennDiagram_1.6.20 preprocessCore_1.54.0 Biobase_2.52.0 formatR_1.11
[26] plotly_4.9.4.1 ggdendro_0.1.22 caTools_1.18.2 scales_1.1.1 checkmate_2.0.0
[31] stringr_1.4.0 digest_0.6.27 minqa_1.2.4 XVector_0.32.0 pkgconfig_2.0.3
[36] htmltools_0.5.1.1 lme4_1.1-27.1 fastmap_1.1.0 limma_3.48.1 htmlwidgets_1.5.3
[41] rlang_0.4.11 GlobalOptions_0.1.2 RSQLite_2.2.7 shape_1.4.6 generics_0.1.0
[46] jsonlite_1.7.2 gtools_3.9.2 dplyr_1.0.7 zip_2.2.0 RCurl_1.98-1.3
[51] magrittr_2.0.1 GenomeInfoDbData_1.2.6 futile.logger_1.4.3 Matrix_1.3-3 Rcpp_1.0.7
[56] munsell_0.5.0 S4Vectors_0.30.0 fansi_0.5.0 lifecycle_1.0.0 yaml_2.2.1
[61] stringi_1.7.3 MASS_7.3-54 zlibbioc_1.38.0 org.Hs.eg.db_3.13.0 gplots_3.1.1
[66] plyr_1.8.6 grid_4.1.0 blob_1.2.2 parallel_4.1.0 MSstatsConvert_1.2.2
[71] ggrepel_0.9.1 crayon_1.4.1 MSstats_4.0.1 lattice_0.20-44 Biostrings_2.60.2
[76] splines_4.1.0 circlize_0.4.13 KEGGREST_1.32.0 pillar_1.6.2 boot_1.3-28
[81] log4r_0.3.2 seqinr_4.2-8 marray_1.70.0 stats4_4.1.0 futile.options_1.0.1
[86] glue_1.4.2 lambda.r_1.2.4 data.table_1.14.0 png_0.1-7 vctrs_0.3.8
[91] nloptr_1.2.2.2 tidyr_1.1.3 gtable_0.3.0 getopt_1.20.3 purrr_0.3.4
[96] cachem_1.0.5 ggplot2_3.3.5 openxlsx_4.2.4 viridisLite_0.4.0 survival_3.2-11
[101] tibble_3.1.3 pheatmap_1.0.12 AnnotationDbi_1.54.1 memoise_2.0.0 IRanges_2.26.0
[106] corrplot_0.90 cluster_2.1.2 ellipsis_0.3.2

biodavidjm commented 3 years ago

Thanks! You are using the right version. The issue might be the keys.txt file. Could you please copy and paste here the content of the keys file? Alternatively, you could send it by email to artms.help@gmail.com

hemingwang commented 3 years ago

My key file is

Raw.file Condition BioReplicate Run IsotopeLabelType A1.raw a a_1 1 L A2.raw a a_2 2 L A3.raw a a_3 3 L B_1.raw b b_1 1 L B_2.raw b b_2 2 L B_3.raw b b_3 3 L C_1.raw c c_1 1 L C_2.raw c c_2 2 L C_3.raw c c_3 3 L

biodavidjm commented 3 years ago

Ok, we got it,

the problem is your keys. Please, check the documentation to find out more about it Content > Input files > keys.txt

i.e., you are using _ instead of - in the BioReplicate column. Change that (a-1 instead a_1, etc), re-run artmsQuantification.

We definitely need to add a function to check for this to make sure it stops the analysis if the. We'll do it in the next version of artMS.

Thanks

hemingwang commented 3 years ago

I replace the a_1 to a-1, But I met the same wrong By the way. I using the MSstats run the same file, and I finish it. I did not meet any wrong

biodavidjm commented 3 years ago

Ok, I forgot to mention to make the "Run" column from 1 to 9 and please, try again.

hemingwang commented 3 years ago

I finished it! Thank you !

hemingwang commented 3 years ago

Hi, I want to analysis the Methylation in my data. So I set the user defined PTM

in my config file, I wrote:

data: enabled: 1 silac: enabled: 0 filters: enabled: 1 contaminants: 1 protein_groups: remove modifications: PTM:KR:methyl

But I met the trouble:

Error in .artms_filterData(x = x, config = config, verbose = verbose) : The config > data > filters > modification PTM:KR:METHYL is not valid option

biodavidjm commented 3 years ago

Glad to hear that the issue was solved. With respect to the other question, could you please start a new github issue?

hemingwang commented 3 years ago

Thanks for your patience helping me!

jfertaj commented 3 years ago

Hi David,

Sorry for open again this issue. I have run my data using a the example time course experiment template in MSstats manual and it run without any warnings, I don't know if the issue could be that my data is a time course experiment with same sample measured in two different times and it caused artMS to failed. I don't know how to translate the annotation file required in MSstast to keys file for artMS but I attached here the file in case you want to have a look

Thanks Juan annotation2.txt

biodavidjm commented 3 years ago

Hi Juan,

It looks like you have 6 different conditions (Time1_N, Time1_P, etc), with 15 bioreplicates each? (Sample_10N, Sample_11N, etc). Is this correct?

if it is the case, you are not following the naming rules explained above and in the documentation.

This would be very easy to solve, i.e., you should call your bioreplicates Time1_N-1, Time1_N-2, Time1_N-3,... Time1_N-15 etc.