Closed RemiMaglione closed 4 years ago
Other minor changes
1- lot's of error from the step 2. Go through this by inactivated some lines (with //
) in MAFIN/main.nf, because values was already declared :
line 432: // include './modules/sourmashgetdatabase'
line 433: // sourmash_download_db()
line 453: // include './modules/checkmsetupDB'
line 454: // include './modules/checkmgetdatabases'
line 456:// checkm_setup_db(checkm_download_db(), untar)
2- As suggested in the Installation --nanopore
for nanopore file(s) path doesn't work. Only --ont
worked so far for me
[More going on]
Hello, thanks for writing an issue. can you tell us also the full command that you used? we will look into that :)
eggnog_download_db'
failed because download_eggnog_data.py
not found.
I don't know if it's an issue from my side but I download and install it on the metwarp-env with:
conda install -c bioconda eggnog-mapper
Since I'm kind of new in the conda env dynamics,even if the eggnog install went well, the download_eggnog_data.py are now on my env path, but the main.nf keep failing at that step. I had to manually download the eggnog db by running:
nextflow run /path/to/my/MAFIN/modules/eggnog_get_databases.nf
[Which worked, proving that download_eggnog_data.py are accessible for this module but now for the main.nf (???)]
Hello, thanks for writing an issue. can you tell us also the full command that you used? we will look into that :)
So far nextflow run /path/to/my/MAFIN/main.nf --output /path/to/my/output --assembler metaspades --illumina /path/to/my/illumina.fastq --ont /path/to/my/ont.fastq --core 60 --memory 500g -profile conda
[I'll continue to post every issues/solutions i'm running to]
yes, thank you very much. conda is always tricky to set up and work perfectly sadly.
checkm_setup_db
steps appears unusually long (I checked the code of both checkmgetdatabases.nf
and checkmsetupDB.nf
and it looks like a download and installation of the checkm database: it should not took more than 2 hours, or should it ?)
So I kill the main command and manually launch the checkmgetdatabases with nextflow run checkmgetdatabases.nf
and now the pipeline move to the fastp step.
[Question]: in the checkm.nf code I found the parameter ${task.cpus}
and when looking at the main installation this trigger a misunderstanding: How do we have to parse "threads" in the main command:
--core
(as suggested in the Usage section)
or--cpus
(as suggested in the Complete help and options section) ?Pipeline failed during the spades step, but look like it's an issue from checkm_setup_db when it failed to create a conda env
executor > local (28) executor > local (28) [f3/f00282] process > sourmash_download_db [100%] 1 of 1 ✔ [23/8a24f1] process > checkm_download_db [100%] 1 of 1 ✔ [- ] process > checkm_setup_db - [09/567fe2] process > discard_short (22) [100%] 22 of 22 ✔ [e4/f0da69] process > merge (1) [100%] 1 of 1 ✔ [0b/fef8ba] process > fastp (1) [100%] 1 of 1 ✔ [bf/42661e] process > spades (1) [100%] 1 of 1, failed: 1 [- ] process > minimap2 - [- ] process > bwa - [- ] process > metabat2 - [- ] process > maxbin2 - [- ] process > concoct - [- ] process > refine3 - [- ] process > checkm - [- ] process > sourmash_bins - [- ] process > sourmash_checkm_parser - [24/db07e2] process > eggnog_download_db [100%] 1 of 1 ✔ [- ] process > eggnog_bin - [- ] process > parser_bin - Oops .. something went wrong WARN: Killing pending tasks (1) Error executing process > 'checkm_setup_db' Caused by: Failed to create Conda environment command: conda create --mkdir --yes --quiet --prefix /path/to/my/output/nextflow-autodownload-databases/checkm/db/work/conda/env-f158ef0f26abfac27f08a061ab129d86 bioconda::checkm-genome status : 120 message:
This is where I went so far
Hello, First of all, could you tell me which version are you using? the master branch, the legacy 0.1 ?
Second for the 2 first question:
Third for the other minor changes:
Indeed there is a redundancy that was left as you can use only the second step without the first (that already load the DBs).
fixed the readme
For the Eggnog download database issue, please open a new issue. it seems indeed that the main script can't access the file and download the database. I will work on that as soon as possible.
in the command you use (nextflow run /path/to/my/MAFIN/main.nf --output /path/to/my/output --assembler metaspades --illumina /path/to/my/illumina.fastq --ont /path/to/my/ont.fastq --core 60 --memory 500g -profile conda
) you should not specify the files themselves (illumina.fastq and ont.fastq) but the directory containing them. The file should also have the same "basename" (e.g. SR002_R1.fastq, SR002_R2.fastq for illumina and SR002.fastq for nanopore) this is to avoid any further issues in the analysis (and that's probably one source of the spades error)
Checkm indeed shouldn't take too long, I invite you to open another issue for checkm only to keep everything clear. Do you know which of the 2 processes (download and setup) was the one taking so long?
The usage is --cpus
the --cores
is as the --nanopore
an echo of a development phase
For the checkm/spades error, it is possible that both checkm setup and spades got an error and in that case nextflow only report one graphically. Could you upload the .nextflow.log
file (it's in the directory where you executed your nextflow command) as well as the .command.err
.command.sh
.command.log
of the checkm and spades process? the .command files are in the respective working directory of setupcheckm and spades /work/??/??????!!!!!!!!!/.command.err
where the ? represent the process IDs ( here spades is bf/42661e
) and the ! are the continuation of the directory name (a simple tab press should avoid you typing it).
Besides the 2 bigger issues (eggnog and checkm/spades) all the others are either answered here or pushed on the latest version of master. If you need more detailed answers or find new issues feel free to post and we'll answer as soon as possible.
Hello,
First of all, could you tell me which version are you using? the master branch, the legacy 0.1 ?
I think yes (I downloaded MAFIN with git clone https://github.com/RVanDamme/MAFIN.git
last week)
For the Eggnog download database issue, please open a new issue.
Done
in the command you use (nextflow run /path/to/my/MAFIN...
Sorry, my mistakes, I truly provide only the path, not the path+the file as mentioned in this issue.
Checkm indeed shouldn't take too long, I invite you to open another issue for checkm only to keep everything clear
Done
Do you know which of the 2 processes (download and setup) was the one taking so long?
Unfortunately not and I quickly went through this problem since I did it manually right away.
Could you upload the .nextflow.log
Could you upload the .nextflow.log file (it's in the directory where you executed your nextflow command) as well as the .command.err .command.sh .command.log of the checkm and spades process?
- A strange thing is that the 'work' folder I had on my output folder was from a previous attempt. After digging a bit in the subfolder, I found that the actual work folder we are interested in fall under
/path/to/my/output/nextflow-autodownload-databases/checkm/db/work/
On that folder I did find the Spades "process" folder but not the .command.err .command.sh .command.log files. I have 3 symbolic link pointing clean.fastq files (probably yielded by the fastp step) and a spades_output folder (containing:configs corrected dataset.info input_dataset.yaml K21 K33 K55 misc params.txt pipeline_state run_spades.sh run_spades.yaml spades.log tmp
)
On that work folder, the checkm '23' folder process was empty and I go over all the other work subfoler and nothing look like .command.* checkm log... Sorry
Thank you for all your answers
1. A strange thing is that the 'work' folder I had on my output folder was from a previous attempt. After digging a bit in the subfolder, I found that the actual work folder we are interested in fall under `/path/to/my/output/nextflow-autodownload-databases/checkm/db/work/`
Nextflow creates the 'work ' directory where you run the command, in this case, you probably run your command while you had
/path/to/my/output/nextflow-autodownload-databases/checkm/db/work/
as current directory2. On that folder I did find the Spades "process" folder but not the .command.err .command.sh .command.log files. I have 3 symbolic link pointing clean.fastq files (probably yielded by the fastp step) and a spades_output folder (containing:` configs corrected dataset.info input_dataset.yaml K21 K33 K55 misc params.txt pipeline_state run_spades.sh run_spades.yaml spades.log tmp`)
in the folder did you run an ls -a
? if no you should run it to find the files. If yes it probably means that nextflow pass the files to run spades but didn't start the process or was killed before starting
3. On that work folder, the checkm '23' folder process was empty and I go over all the other work subfolder and nothing look like .command.* checkm log... Sorry
the .command.* is not present in the subfolder of checkm but present in the folder that contains the whole process ( e.g.
work/??/????????/.command.err
). If you are in that directory (or you put the path in the command) just do anls -a
orls -ltra
and you should see the files. If the files are still not there it is a weird issue link to nextflow behavior.
I finally found the files: spades_step_commandX.zip
Another thing: I tried to run Spades by myself. It crash at the error-correction step. Now, I wonder if the Spades issue comes from my side: I work with kind of huge files yielded by a Novaseq sequencing (100M reads per sample minimum). I'm running everything on a server that have 64 cores, 500 Go RAM. I'll continue to debug that step on my side, but the crash may have occurred due to lack of RAM
@RemiMaglione can you give us the .nextflow.log file too? its directly in the working dir where you executed nextflow (its saves the last 10 runs in 10 files, e.g. .nextflow.log.1) newest is .nextflow.log oldest .nextflow.log.9
@ram error its usually exit code 146 or so if its a RAM issue if you are using nextflow at least.
can you give us the .nextflow.log file too?
Sure: .nextflow.log
@RVanDamme i think i found the error:
conda.createTimeout = '1 h'
€ thx for the log, helped to figure out the error
Hello @RemiMaglione, The version 1.0.0 of MUFFIN just got released. This should solve most of the issue you faced while trying the pre-release. If you face issue identical to the one reported here or new issue please feel free to open new issues and report them to us. I will close this issue for now.
Hi RVanDamme team,
1- Question : does the Unicycler pipeline automatically launch with the --assembler metaspades option or does it has its own optional parameter (the hybrid reassembly is pointed as optional on the workflow) ?
2- Minor change: in the installation guide at the create env step, it lack the 'create' in conda create -y -p /path/to/install/metawrap-env python=2.7
thanks for that pipeline Best