ardigen / nasqq

NASQQ: end-to-end Nextflow pipeline designed for automated analysis of 1D 1H NMR proton magnetic resonance spectra.
MIT License
0 stars 1 forks source link

Command error: Error in Fid_info[1, "DECIM"] : subscript out of bounds #7

Closed kkazi1980 closed 6 months ago

kkazi1980 commented 6 months ago

I am getting the Group Delay Correction error when trying to process the dataset MTBLS9131(metabolights)

Command error: Error in Fid_info[1, "DECIM"] : subscript out of bounds Calls: Execution halted

I have DECIM defined as 24 in all data folders.

lukpru commented 6 months ago

Are there any outputs from previous steps, namely load_fids and raw_fids_visualisation? How manifest.csv and metadata looks like? There error most likely indicates that there are no data after loading raw fids.

kkazi1980 commented 6 months ago

You can try. I simply replaced the folders in your testthat/data/ folders with different datasets (from MTBLS9131) and even named them in the same way (500 etc.), keeping your metadata. Of course, it does not make any sense in scientific terms, but I wanted to test the workflow on different signals. MTBLS9131.zip

lukpru commented 6 months ago

After downloading the raw data directly via FTP using the command: wget -r -np -nH --cut-dirs=5 ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS9131/ and then making a little modifications to the metadata.csv, manifest.csv, and params.yml files based on the information provided in UPEC_metadata.xlsx everything seems to work just fine. It is crucial to ensure that only two groups are kept for comparison in metadata. If the pipeline is executed with three groups, it will run successfully until the univariate analysis, where an error will appear indicating that there are more than two groups to compare. I am also including all files (only inputs as full results are still in process of generating NASQQ_MTBLS9131.zip ) in .zip archives in here.

lukpru commented 6 months ago

Here are the results from the spectral processing. Aside from three samples that likely needed manual flipping after FT and some adjustment of baseline correction parameters, everything looks fine. The folder containing all results exceeds 600MB, so I am unable to upload it here.

Spectrum_data_N_stacked.pdf

lukpru commented 6 months ago

Screenshot from 2024-05-25 16-56-45 It took 2 hours, but the results have been fully generated. Additionally, treated samples are clearly distinguished from the negative controls, suggesting that everything proceeded as expected (see next screenshot of PCA)

lukpru commented 6 months ago

pca_matrix_SampleType

lukpru commented 6 months ago

One outlier was detected in the univariate module. This sample exhibited strong water signal during spectral processing stage, resulting in poor FT and phasing. This outlier is also evident in the PC1 vs PC2 plot and on the heatmap (sample 30), where distinct clusters for the treatment and control groups are clearly visible as well. correlation_clustermap_patients

kkazi1980 commented 6 months ago

After downloading the raw data directly via FTP using the command: wget -r -np -nH --cut-dirs=5 ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS9131/ and then making a little modifications to the metadata.csv, manifest.csv, and params.yml files based on the information provided in UPEC_metadata.xlsx everything seems to work just fine. It is crucial to ensure that only two groups are kept for comparison in metadata. If the pipeline is executed with three groups, it will run successfully until the univariate analysis, where an error will appear indicating that there are more than two groups to compare. I am also including all files (only inputs as full results are still in process of generating NASQQ_MTBLS9131.zip ) in .zip archives in here.

Hi,

I downloaded your files and run the program. It crashes on Solven t Suppression:

Nextflow 24.04.1 is available - Please consider updating your version to it N E X T F L O W ~ version 23.10.1 Launching ./main.nf [gigantic_carson] DSL2 - revision: eee0efc519

        .--:--:---.\
      {}  : {}   :                           __  _  __   __  ____
      ||__"_||   :                          |  || ||  \ /  || __ \
     /        \  `={}_                      | N \ A| S   Q || Q_| \
    |   NASQQ  |  (   )                     |_| \_||_| V |_||_| \_|
    |  v1.0.0  |  (   )                     
    |    ____  |  (   )    Nextflow Automatization and Standarization for Qualitative and Quantitative
    |   |    | |  (   )             1H 1D NMR metabolomics data preparation and analysis
    |___|____|_|  (   )                   =======================================
    |          |  (   )                   input from     : manifest.csv
   /|    ||    |\ (   )                   output to      : output/outdir_review
  | |    ||    | |(   )                   ------
  | |____||____| |(   )                   run as         : nextflow run ./main.nf -c ./nextflow.config -profile standard -params-file params.yml
  |   _________  |(   )                   started at     : 2024-05-27T09:09:47.060396+02:00
  |  |   | |   | |(   )                   launchdir at   : /workspace1/Dropbox/Krzysztof/revisions/GIGAScience/nasqq-main
  |__|   |_|   |_|(___)

[- ] process > SPECTRAL_PREPROCESSING:LOAD... - [- ] process > SPECTRALPREPROCESSING:RAW... - executor > local (1) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [ 0%] 0 of 1 [- ] process > SPECTRALPREPROCESSING:RAW... - executor > local (3) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [ 0%] 0 of 1 executor > local (4) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [ 0%] 0 of 1 executor > local (4) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [ 0%] 0 of 1 [7a/38d866] process > SPECTRAL_PREPROCESSING:GROU... [100%] 1 of 1 ✔ [bf/ee4e0f] process > SPECTRAL_PREPROCESSING:SOLV... [ 50%] 1 of 2, failed: 1... executor > local (5) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [100%] 1 of 1 ✔ [7a/38d866] process > SPECTRAL_PREPROCESSING:GROU... [100%] 1 of 1 ✔ executor > local (5) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [100%] 1 of 1 ✔ [7a/38d866] process > SPECTRAL_PREPROCESSING:GROU... [100%] 1 of 1 ✔ [78/152663] process > SPECTRAL_PREPROCESSING:SOLV... [ 50%] 1 of 2, failed: 1... [- ] process > SPECTRAL_PREPROCESSING:APOD... - [- ] process > SPECTRAL_PREPROCESSING:ZERO... - [- ] process > SPECTRAL_PREPROCESSING:FOUR... - [- ] process > SPECTRAL_PREPROCESSING:ZERO... - [- ] process > SPECTRAL_PREPROCESSING:INTE... - [- ] process > SPECTRAL_PREPROCESSING:BASE... - [- ] process > SPECTRAL_PREPROCESSING:NEGA... - [- ] process > SPECTRAL_PREPROCESSING:WARPING - [- ] process > SPECTRAL_PREPROCESSING:WIND... - [- ] process > SPECTRAL_PREPROCESSING:BUCK... - [- ] process > SPECTRAL_PREPROCESSING:NORM... - [- ] process > METABOLITES_QUANTIFICATION - [- ] process > ADD_METADATA - [- ] process > COMBINE_DATASET_BATCHES - [- ] process > DATA_ANALYSIS:FEATURES_PROC... - [- ] process > DATA_ANALYSIS:EXPLORATORY_D... - [- ] process > DATA_ANALYSIS:UNIVARIATE_AN... - [- ] process > DATAANALYSIS:MULTIVARIATE... - [- ] process > PATHWAY_ANALYSIS_MULTIVARIATE - [- ] process > PATHWAY_ANALYSIS_UNIVARIATE - ERROR ~ Error executing process > 'SPECTRAL_PREPROCESSING:SOLVENT_SUPPRESSION (review)'

Caused by: Process SPECTRAL_PREPROCESSING:SOLVENT_SUPPRESSION (review) terminated with an error exit status (1)

Command executed:

solvent_suppresion.R --id "review" --fid_gdc "review_grouped_FIDdata_GDC.rds" --raw_rds "review_selected_fid_list.rds"

executor > local (5) [ff/701c1d] process > SPECTRAL_PREPROCESSING:LOAD... [100%] 1 of 1 ✔ [42/1372e0] process > SPECTRALPREPROCESSING:RAW... [100%] 1 of 1 ✔ [7a/38d866] process > SPECTRAL_PREPROCESSING:GROU... [100%] 1 of 1 ✔ [78/152663] process > SPECTRAL_PREPROCESSING:SOLV... [100%] 2 of 2, failed: 2... [- ] process > SPECTRAL_PREPROCESSING:APOD... - [- ] process > SPECTRAL_PREPROCESSING:ZERO... - [- ] process > SPECTRAL_PREPROCESSING:FOUR... - [- ] process > SPECTRAL_PREPROCESSING:ZERO... - [- ] process > SPECTRAL_PREPROCESSING:INTE... - [- ] process > SPECTRAL_PREPROCESSING:BASE... - [- ] process > SPECTRAL_PREPROCESSING:NEGA... - [- ] process > SPECTRAL_PREPROCESSING:WARPING - [- ] process > SPECTRAL_PREPROCESSING:WIND... - [- ] process > SPECTRAL_PREPROCESSING:BUCK... - [- ] process > SPECTRAL_PREPROCESSING:NORM... - [- ] process > METABOLITES_QUANTIFICATION - [- ] process > ADD_METADATA - [- ] process > COMBINE_DATASET_BATCHES - [- ] process > DATA_ANALYSIS:FEATURES_PROC... - [- ] process > DATA_ANALYSIS:EXPLORATORY_D... - [- ] process > DATA_ANALYSIS:UNIVARIATE_AN... - [- ] process > DATAANALYSIS:MULTIVARIATE... - [- ] process > PATHWAY_ANALYSIS_MULTIVARIATE - [- ] process > PATHWAY_ANALYSIS_UNIVARIATE - ERROR ~ Error executing process > 'SPECTRAL_PREPROCESSING:SOLVENT_SUPPRESSION (review)'

Caused by: Process SPECTRAL_PREPROCESSING:SOLVENT_SUPPRESSION (review) terminated with an error exit status (1)

Command executed:

solvent_suppresion.R --id "review" --fid_gdc "review_grouped_FIDdata_GDC.rds" --raw_rds "review_selected_fid_list.rds"

Command exit status: 1

Command output: (empty)

Command error: Error in difsm(y = FidRe, lambda = lambda.ss) : NA/NaN/Inf in foreign function call (arg 2) Calls: -> difsm Execution halted

Work dir: /workspace1/Dropbox/Krzysztof/revisions/GIGAScience/nasqq-main/output/workdir_review/78/152663b306185410eeb3b2107a352a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

lukpru commented 6 months ago

Hi, firstly, I edited the archive with the uploaded files on Saturday because it had a typo in params.yml and lack of batch column in metadata. Do you have the most recent version of it?

Secondly, apart from the errors from Nextflow, please provide:

  1. The .nextflow.log.{number of execution} file (each time you execute pipeline in the same directory it adds subsequent number, so it might be .nextflow.log.1).
  2. The outcomes of the last two processes:
    • For the process that crashed, I need the list of the files in working directory alongside the files .command.out and .command.err.
    • For the previous successfully executed process, I need the .rds files so that I can load them and see what’s going on.
  3. I need to have a look on the input files (manifest.csv, params.yml, metadata.csv) maybe there are some typo.
kkazi1980 commented 6 months ago

Hi,

I took the input files from your zip linked in this thread (https://github.com/ardigen/nasqq/files/15443973/NASQQ_MTBLS9131.zip) But the params.yml from the zip file has been modified on Friday, 17∶11∶38. Is it the right one?

lukpru commented 6 months ago

Oh, I edited params.yml on Friday and metadata.csv on Saturday, my bad. Those files seem to be fine. Please provide the following from points 1-3.

kkazi1980 commented 6 months ago

these are the files generated/modified today files.zip

lukpru commented 6 months ago

Okay, those files seem to be fine if the path /workspace1/Dropbox/Krzysztof/revisions/GIGAScience/nasqq-main is an absolute path. If not, I recommend using either an absolute path or a relative path starting from the directory where all input files are located e.g. ./metadata.csv, ./MTBLS9131/FILES/RAW_FILES. Now, the most crucial files I need are review_grouped_FIDdata_GDC.rds from the 7a/38d866 workdir and review_selected_fid_list.rds from the ff/701c1d workdir (it is shorten name of workdirs, it is much longer but Nextflow shows only this during execution).

lukpru commented 6 months ago

Anyways, after resolving these issues, adding a "Debugging" section to the README might be a good idea to speed up the process of resolving potential bugs as it not so straightforward, especially for somebody without experience in Nextflow pipelines.

kkazi1980 commented 6 months ago

These files are in different locations, not in 7a and ff. Here is the full workdir:

https://www.dropbox.com/scl/fi/n6t5j1ycmtchicfcvebor/workdir_review.zip?rlkey=p2z8w1eiz7yfzzkyioy5r3yqe&dl=0

lukpru commented 6 months ago

Can you please remove entire workdir, output, all .nextflow.log(s) and run pipeline one more time? I see that it crashed on two samples 4 and 33, and I am trying to figure out why.

kkazi1980 commented 6 months ago

The same, crashes on Solvent suppression

https://www.dropbox.com/scl/fo/smq6itb1q55cv7ypxt7qk/AF9cipZEKKWQiQxZoUpj954?rlkey=kzyrxu0votlc415qbv375aar5&dl=0

lukpru commented 6 months ago

So far, it seems like FIDs that you have in your directory are completely different then mine, either preprocessed by pipeline or even using manual function in R. Below few examples: 3_raw_plot

lukpru commented 6 months ago

And that is one from yours output: 3_raw_plot

lukpru commented 6 months ago

Can you please start from scratch? Here are the steps:

  1. Clone the repository of the nasqq pipeline to a new location. Remove all remnants of previous executions or change the execution location.
  2. Create a new folder inside the nasqq directory, for example, review.
  3. Navigate to this new review folder.
  4. Download the raw files from Metabolights via FTP.
  5. Copy manifest.csv, params.yml, run.sh, and metadata.csv into the review directory.
  6. Adjust the paths in these files to absolute paths.
  7. Run run.sh.
lukpru commented 6 months ago

That's super strange, I did it one more time as I described above and it's running correctly... Screenshot from 2024-05-27 11-38-05

kkazi1980 commented 6 months ago

it seems to be ok now, but crashed on warping, because I did not adjust the number of CPUs... I will let you know soon

kkazi1980 commented 6 months ago

OK, completed successfully, thanks!