RVanDamme / MUFFIN

hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis
https://rvandamme.github.io/MUFFIN_Documentation/#introduction
GNU General Public License v3.0
65 stars 11 forks source link

trying to run own data after test run OK #34

Closed drelo closed 1 month ago

drelo commented 2 years ago

I am trying to understand something about the pipeline in order to scale up and run it over 16 metagenomes for which we have illumina and nanopore data. I think I have two questions, one about initializing the run with my own data. A second one about how to reuse the images downloaded via singularity.

The test run went fine. then I renamed the results folder and tried to run it again with my own data. nextflow run RVanDamme/MUFFIN -profile local,singularity --illumina /illumina/ --ont /nanopore/ --assembler metaspades --cpus 20 --memory 200g --modular assemb-class --name 014 --output P014. The code executed in a few seconds and the results have no 'result' like in the test run.

The illumina files are: P014_R1.fastq P014_R2.fastq The nanopore files are: P014.fastq

Here is the the log on the screen

I am trying to understand how MUFFIN works. The only way to make the pipeline work again was removing all the folders and run again nextflow run RVanDamme/MUFFIN -profile local,singularity,test --cpus 25 --memory 300g Now after the test run a second time I have 2 folders that I thought I could retain, I could point to the folder nextflow-autodownload-databases with the paths in a .yml file in the future but how could I point to the work/singularity folder? Can I reuse the images downloaded in work/singularity so I don't have to download them again?

How can I manage to run MUFFIN with my own data. I wonder if put something wrong. I checked issues #33 and #30 which sounded similar, but I don't know where to start to diagnose this. Let me know of any log file I can provide or test I can perform to fix this. Any help would be appreciated Cheers

Andrés

drelo commented 2 years ago

Sorry for the double post I updated with all the details of the issue in the post above. Thanks for your help.

replikation commented 2 years ago

hi,

i updated the singularity profile. You can use e.g. --cachedir dir/ to specify a location to store and use the singularity images. default location would be ./singularity_images. The changes are in the current master if it works on your end I could update the help and create a release. Currently don't have the time to properly test it as I don't have a singularity env available.

drelo commented 2 years ago

Dear Christian,

Thanks for your time with this, the commit or improvement worked fine, I run again the test with the new version [939ff8ca71] and now (after renaming the results folder) I could start again skipping the download of the databases or images, it went directly to process the samples.

nextflow run RVanDamme/MUFFIN --output results_dir --cpus 30 --memory 200g -profile local,singularity,test --cachedir ./singularity_images/ --sourmash_db ./nextflow_autodownload-databases/sourmash/genbank-k31.lca.json.gz --eggnog_db nextflow-autodownload-databases/eggnog/eggnog-db/eggnog.db

N E X T F L O W ~ version 21.04.1 Launching RVanDamme/MUFFIN [maniac_newton] - revision: 939ff8ca71 [master] [- ] process > test [ 0%] 0 of 1 [- ] process > discard_short - [- ] process > merge - [- ] process > fastp - [- ] process > spades - [- ] process > minimap2 - executor > local (2) [a0/9bba7a] process > test [ 0%] 0 of 1

Now I tried to run my data but it ends quickly

nextflow -log muf.log run RVanDamme/MUFFIN --output hibrido --cpus 30 --memory 200g -profile local,singularity --cachedir ./singularity_images/ --sourmash_db ./nextflow_autodownload-databases/sourmash/genbank-k31.lca.json.gz --eggnog_db nextflow-autodownload-databases/eggnog/eggnog-db/eggnog.db --modular assemb --illumina ./illumina/ --ont ./nanopore/

N E X T F L O W ~ version 21.04.1 Launching RVanDamme/MUFFIN [hungry_morse] - revision: 939ff8ca71 [master] executor > local (1) [3f/54840a] process > readme_output [ 0%] 0 of 1 executor > local (1) [3f/54840a] process > readme_output [ 0%] 0 of 1 executor > local (1) [3f/54840a] process > readme_output [100%] 1 of 1 ✔ executor > local (1) [3f/54840a] process > readme_output [100%] 1 of 1 ✔

Start running MUFFIN MUFFIN is a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis.

If you use MUFFIN for your research pleace cite:

https://www.biorxiv.org/content/10.1101/2020.02.08.939843v1

or

Van Damme R., Hölzer M., Viehweger A., Müller B., Bongcam-Rudloff E., Brandt C., 2020 "Metagenomics workflow for hybrid assembly, differential coverage binning, transcriptomics and pathway analysis (MUFFIN)", doi: https://doi.org/10.1101/2020.02.08.939843


Done! Results are stored here --> hibrido The Readme file in hibrido describe the structure of the results directories.

Could you help me to understand what is wrong so I can run this? In the meantime I will try with the profile local,conda Thanks in advance. Best

Andrés

The illumina files are: P014_R1.fastq P014_R2.fastq The nanopore files are: P014.fastq

Here is the log Here is the execution report

replikation commented 2 years ago
drelo commented 2 years ago

Hi again, thanks for your help.

Illumina == P014_R1.fastq P014_R2.fastq Nanopore == P014.fastq I was using this --illumina ./illumina/ --ont ./nanopore/

I just tried providing the full path and also gave the path within a .yml file but that didn't work.

nextflow run RVanDamme/MUFFIN -profile local,singularity --cachedir ./singularity_images/ --sourmash_db ./nextflow_autodownload-databases/sourmash/genbank-k31.lca.json.gz --eggnog_db nextflow-autodownload-databases/eggnog/eggnog-db/eggnog.db -params-file PAR.yml

assembler : "metaspades" ouptut : "/mnt/cive/andres/muffin/muffins" illumina : "/mnt/cive/andres/muffin/illumina" ont : "/mnt/cive/andres/muffin/nanopore" cpus : 30 memory : "200g" modular : "assemb-class"

Best