LiaOb21 / colora

A snakemake workflow for de novo genome assembly. It produces chromosome-scale primary/phased assemblies complete with organelles (mitochondrion and/or chloroplast). It automatically assesses the quality of the results.
MIT License
15 stars 2 forks source link

ValueError in file #3

Open SolayMane opened 1 month ago

SolayMane commented 1 month ago

I setup my config file and I run snakemake --cores all --use-conda --configfile config_argania.yaml here is the error : Config file config/config.yaml is extended by additional config specified via the command line. ValueError in file https://raw.githubusercontent.com/LiaOb21/colora/Colora_v1.1.0/workflow/Snakefile, line 11: not enough values to unpack (expected 1, got 0) File "https://raw.githubusercontent.com/LiaOb21/colora/Colora_v1.1.0/workflow/Snakefile", line 11, in

LiaOb21 commented 1 month ago

Hi Solay,

I saw that error many times when some files are not found in the right place (i.e., where Snakemake expects them to be).

Where are your hifi_reads.fastq.gz and hic_reads.fastq.gz? Are these paths correctly set up in the config file? Are they in fastq.gz format?

LiaOb21 commented 1 month ago

Also, it would be useful to test the workflow before running it on real data, as explained here: https://github.com/LiaOb21/colora?tab=readme-ov-file#test-the-pipeline

Let me know! :blush:

SolayMane commented 1 month ago

here is my config file:

# config.yaml for real data

# Set memory and threads for high demanding rules
high:
  mem_mb: 409600 # memory in MB
  t: 50 # number of threads

# Set memory and threads for medium demanding rules
medium:
  mem_mb: 204800 # memory in MB
  t: 20 # number of threads

# Set memory and threads for low demanding rules
low:
  mem_mb: 51200 # memory in MB
  t: 8 # number of threads

# Path to hifi reads
hifi_path: "/sanhome2/Argania_assembly/ChAssembly24/rawdata/PB/"

# Path to hic reads
hic_path: "/sanhome2/Argania_assembly/ChAssembly24/rawdata/Hic/BMK240627-CC766-ZX01-0101/BMK_DATA_20240913145637_1/Data/"

# Customisable parameters for kmc
kmc:
  k: 27 # kmer size, it will be the same used for genomescope2
  ci: 1 # exclude k-mers occurring less than <value> times (default: 2)
  cs: 1000000 #maximal value of a counter (default: 255)

# Customisable parameters for kmc_tools transform
kmc_tools:
  cx: 1000000 # exclude k-mers occurring more of than <value> times

# Customisable parameters for genomescope2
genomescope2:
  optional_params:
    "-p": "2"
    "-l": ""

# Customisable parameters for oatk
oatk:
  k: 1001 # kmer size [1001]
  c: 150 #  minimum kmer coverage [3]
  m: "resources/oatkDB/embryophyta_mito.fam" # mitochondria gene annotation HMM profile database [NULL]
  optional_params: 
    "-p": "resources/oatkDB/embryophyta_pltd.fam" # to use for species that have a plastid db

# Customisable parameters for fastp
fastp:
  optional_params:
    "--cut_front": False # set to True for Arima Hi-C library prep kit generated data
    "--cut_front_window_size": "" # set to 5 for Arima Hi-C library prep kit generated data

# Customisable parameters for hifiasm
hifiasm:
  phased_assembly: False # set to true if you want to obtain a phased assembly
  optional_params: 
    "-f": "" # used for small datasets
    "-l": "" # purge level. 0: no purging; 1: light; 2/3: aggressive [0 for trio; 3 for unzip]
    "--ul": "" # use this if you have also ont data you want to integrate in your assembly

#Set this to False if you want to skip the fcsgx step:
include_fcsgx: False #inlcude this rule only if you have previously downloaded the database (recommended to run fcsgx only on a HPC. It requires around 500 GB of space on your disk and a large RAM)

# Customisable parameters for fcsgx
#fcsgx:
#  ncbi_tax_id: 4513
#  path_to_gx_db: "path/to/fcsgx/gxdb"

# Set this to False if you want to skip purge_dups steps:
include_purge_dups: True

# Customisable parameters for arima mapping pipeline:
arima:
  MAPQ_FILTER: 10

# Customisable parameters for yahs
yahs:
  optional_params: 
    "-e": "A/AGCTT" # you can specify the restriction enzyme(s) used by the Hi-C experiment

# Customisable parameters for quast
quast:
  optional_params: 
    "--fragmented": ""
    "--large": ""
#    "-r": "resources/reference_genomes/yeast/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa" #reference genome (fasta)
#    "-g": "resources/reference_genomes/yeast/Saccharomyces_cerevisiae.R64-1-1.101.gff3" # reference features (gff)

# Customisable parameters for busco
busco:
  lineage: "resources/busco_db/embryophyta_odb10.2024-01-08.tar.gz" # lineage to be used for busco analysis
  optional_params: 
    "--metaeuk": "" # this can be set to True if needed. The default is miniprot 

here are my hic and PB files PB :merged_cells.fq.gz Hic : Unknown_CC766-004H0001_good_1.fq.gz Unknown_CC766-004H0001_good_2.fq.gz Do I need to rename them?

LiaOb21 commented 1 month ago

Yes, please, use .fastq.gz rather than .fq.gz

LiaOb21 commented 1 month ago

Also, for -e in YaHS parameters, I think you should use , and not / to list the enzymes

SolayMane commented 1 month ago

here is the output of the ruaning snakemake --cores all --use-conda --configfile config_argania.yaml

Config file config/config.yaml is extended by additional config specified via the command line.
Assuming unrestricted shared filesystem usage.
host: inra
Building DAG of jobs...
/bin/bash: conda: command not found
Traceback (most recent call last):

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2095, in args_to_api
    dag_api.execute_workflow(

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 595, in execute_workflow
    workflow.execute(

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1164, in execute
    self.dag.create_conda_envs()

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 454, in create_conda_envs
    env.create(self.workflow.dryrun)

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/deployment/conda.py", line 384, in create
    if self.pin_file:
       ^^^^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake_interface_common/utils.py", line 33, in __get__
    value = self.method(instance)
            ^^^^^^^^^^^^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/deployment/conda.py", line 102, in pin_file
    f".{self.conda.platform}.pin.txt"
        ^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake_interface_common/utils.py", line 33, in __get__
    value = self.method(instance)
            ^^^^^^^^^^^^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/deployment/conda.py", line 95, in conda
    return Conda(
           ^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/deployment/conda.py", line 654, in __init__
    shell.check_output(self._get_cmd("conda info --json"), text=True)

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/site-packages/snakemake/shell.py", line 64, in check_output
    return sp.check_output(cmd, shell=True, executable=executable, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/inra/miniconda3/envs/snakemake/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,

subprocess.CalledProcessError: Command 'conda info --json' returned non-zero exit status 127.
LiaOb21 commented 1 month ago

@SolayMane you should install mamba (preferably) or conda and Snakemake first, as all the environments are created through conda. /bin/bash: conda: command not found suggests to me that you are not in a conda environment. If you installed mamba or conda, maybe you didn't run init. Please read the Usage section carefully: https://github.com/LiaOb21/colora?tab=readme-ov-file#usage And, again, if you can, you should test the pipeline first.

SolayMane commented 1 month ago

Here is the log of the recenent error :

host: inra
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 56
Rules claiming more threads will be scaled down.
Job stats:
job                      count
---------------------  -------
all                          1
bandage_pltd                 1
busco                        1
bwa_index                    1
bwa_mem                      1
fiter_five_end               1
gfastats_pltd                1
hifiasm                      1
nanoplot                     1
oatk_pltd                    1
picard                       1
purge_dups                   1
purge_dups_alt               1
quast                        1
two_read_bam_combiner        1
yahs                         1
total                       16

Select jobs to execute...
Execute 1 jobs...

[Wed Oct  9 09:18:20 2024]
localrule hifiasm:
    input: results/reads/hifi/hifi.fastq.gz
    output: results/hifiasm/asm.primary.gfa, results/hifiasm/asm.alternate.gfa, results/hifiasm/asm.primary.fa, results/hifiasm/asm.alternate.fa, results/assemblies/asm_primary.fa
    log: logs/hifiasm.log
    jobid: 6
    reason: Missing output files: results/assemblies/asm_primary.fa, results/hifiasm/asm.alternate.fa, results/hifiasm/asm.primary.fa
    threads: 50
    resources: tmpdir=/tmp, mem_mb=409600, mem_mib=390625

Activating conda environment: .snakemake/conda/ae9a949f10685e275e7788b0e2db316a_
[Thu Oct 10 05:52:11 2024]
Error in rule hifiasm:
    jobid: 6
    input: results/reads/hifi/hifi.fastq.gz
    output: results/hifiasm/asm.primary.gfa, results/hifiasm/asm.alternate.gfa, results/hifiasm/asm.primary.fa, results/hifiasm/asm.alternate.fa, results/assemblies/asm_primary.fa
    log: logs/hifiasm.log (check log file(s) for error details)
    conda-env: /sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/ae9a949f10685e275e7788b0e2db316a_
    shell:

        hifiasm results/reads/hifi/hifi.fastq.gz -t 50 -o results/hifiasm/asm --primary  >> logs/hifiasm.log 2>&1
        mv results/hifiasm/asm.p_ctg.gfa results/hifiasm/asm.primary.gfa
        mv results/hifiasm/asm.a_ctg.gfa results/hifiasm/asm.alternate.gfa       
        awk -f scripts/gfa_to_fasta.awk < results/hifiasm/asm.primary.gfa > results/hifiasm/asm.primary.fa
        awk -f scripts/gfa_to_fasta.awk < results/hifiasm/asm.alternate.gfa > results/hifiasm/asm.alternate.fa

        # all the assemblies produced by the workflow will be symlinked to results/assemblies

        ln -srn results/hifiasm/asm.primary.fa results/assemblies/asm_primary.fa

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job hifiasm since they might be corrupted:
results/hifiasm/asm.primary.gfa, results/hifiasm/asm.alternate.gfa, results/hifiasm/asm.primary.fa
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-09T091814.115222.snakemake.log
WorkflowError:
At least one job did not complete successfully.
LiaOb21 commented 1 month ago

Hi @SolayMane,

Can you please check the hifiasm.log file? It should be in the logs directory.

SolayMane commented 1 month ago

here is the tail -n 100 of the file :

[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 2077326591 minimizers
[M::ha_pt_gen::66872.637*45.27] ==> indexed 2074465458 positions, counted 23694655 distinct minimizer k-mers
[M::ha_assemble::70078.091*45.48@145.364GB] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 675197376
[M::ha_print_ovlp_stat] # strong overlaps: 486200620
[M::ha_print_ovlp_stat] # weak overlaps: 188996756
[M::ha_print_ovlp_stat] # exact overlaps: 660989699
[M::ha_print_ovlp_stat] # inexact overlaps: 14207677
[M::ha_print_ovlp_stat] # overlaps without large indels: 674209617
[M::ha_print_ovlp_stat] # reverse overlaps: 393805665
[M::ha_opt_update_cov_min] updated max_n_chain to 645
Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
[M::purge_dups] homozygous read coverage threshold: 128
[M::purge_dups] purge duplication coverage threshold: 161
[M::ug_ext_gfa::] # tips::338
Writing raw unitig GFA to disk...
[M::ug_ext_gfa::] # tips::1
Writing processed unitig GFA to disk...
[M::purge_dups] homozygous read coverage threshold: 128
[M::purge_dups] purge duplication coverage threshold: 161
[M::mc_solve:: # edges: 1094]
[M::mc_solve_core_adv::0.194] ==> Partition
[M::adjust_utg_by_primary] primary contig coverage range: [108, infinity]
Writing primary contig GFA to disk...
Writing alternate contig GFA to disk...
Inconsistency threshold for low-quality regions in BED files: 70%
[M::main] Version: 0.19.9-r616
[M::main] CMD: hifiasm -t 50 -o results/hifiasm/asm --primary results/reads/hifi/hifi.fastq.gz
[M::main] Real time: 74019.555 sec; CPU: 3194245.652 sec; Peak RSS: 145.364 GB
LiaOb21 commented 1 month ago

That's strange, from the log file it seems that hifiasm completed successfully. Can I see your config file? Did you try the test workflow? Did it complete successfully?

SolayMane commented 1 month ago

I didn't try the test. In the log file there is a line awk -f scripts/gfa_to_fasta.awk , where I should have the folder scripts? here is the config:

# config.yaml for real data

# Set memory and threads for high demanding rules
high:
  mem_mb: 409600 # memory in MB
  t: 50 # number of threads

# Set memory and threads for medium demanding rules
medium:
  mem_mb: 204800 # memory in MB
  t: 20 # number of threads

# Set memory and threads for low demanding rules
low:
  mem_mb: 51200 # memory in MB
  t: 8 # number of threads

# Path to hifi reads
hifi_path: "/sanhome2/Argania_assembly/ChAssembly24/rawdata/PB/"

# Path to hic reads
hic_path: "/sanhome2/Argania_assembly/ChAssembly24/rawdata/Hic/BMK240627-CC766-ZX01-0101/BMK_DATA_20240913145637_1/Data/"

# Customisable parameters for kmc
kmc:
  k: 27 # kmer size, it will be the same used for genomescope2
  ci: 1 # exclude k-mers occurring less than <value> times (default: 2)
  cs: 1000000 #maximal value of a counter (default: 255)

# Customisable parameters for kmc_tools transform
kmc_tools:
  cx: 1000000 # exclude k-mers occurring more of than <value> times

# Customisable parameters for genomescope2
genomescope2:
  optional_params:
    "-p": "2"
    "-l": ""

# Customisable parameters for oatk
oatk:
  k: 1001 # kmer size [1001]
  c: 150 #  minimum kmer coverage [3]
  m: "resources/oatkDB/embryophyta_mito.fam" # mitochondria gene annotation HMM profile database [NULL]
  optional_params: 
    "-p": "resources/oatkDB/embryophyta_pltd.fam" # to use for species that have a plastid db

# Customisable parameters for fastp
fastp:
  optional_params:
    "--cut_front": False # set to True for Arima Hi-C library prep kit generated data
    "--cut_front_window_size": "" # set to 5 for Arima Hi-C library prep kit generated data

# Customisable parameters for hifiasm
hifiasm:
  phased_assembly: False # set to true if you want to obtain a phased assembly
  optional_params: 
    "-f": "" # used for small datasets
    "-l": "" # purge level. 0: no purging; 1: light; 2/3: aggressive [0 for trio; 3 for unzip]
    "--ul": "" # use this if you have also ont data you want to integrate in your assembly

#Set this to False if you want to skip the fcsgx step:
include_fcsgx: False #inlcude this rule only if you have previously downloaded the database (recommended to run fcsgx only on a HPC. It requires around 500 GB of space on your disk and a large RAM)

# Customisable parameters for fcsgx
#fcsgx:
#  ncbi_tax_id: 4513
#  path_to_gx_db: "path/to/fcsgx/gxdb"

# Set this to False if you want to skip purge_dups steps:
include_purge_dups: True

# Customisable parameters for arima mapping pipeline:
arima:
  MAPQ_FILTER: 10

# Customisable parameters for yahs
yahs:
  optional_params: 
    "-e": "A/AGCTT" # you can specify the restriction enzyme(s) used by the Hi-C experiment

# Customisable parameters for quast
quast:
  optional_params: 
    "--fragmented": ""
    "--large": ""
#    "-r": "resources/reference_genomes/yeast/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa" #reference genome (fasta)
#    "-g": "resources/reference_genomes/yeast/Saccharomyces_cerevisiae.R64-1-1.101.gff3" # reference features (gff)

# Customisable parameters for busco
busco:
  lineage: "resources/busco_db/embryophyta_odb10.2024-01-08.tar.gz" # lineage to be used for busco analysis
  optional_params: 
    "--metaeuk": "" # this can be set to True if needed. The default is miniprot 
LiaOb21 commented 1 month ago

Your config.yaml looks okay. Apart from the restriction enzymes in YaHS as said previously.

The script directory is in colora/scripts. Do you see that directory?

If it can help you, there is a tutorial available on YouTube now: https://youtu.be/-xWgvj_PmZo?si=tGMy0ZyNOJRSQmVs

If you could test the workflow, we can understand if it is a pipeline-related issue or if there is something else.

LiaOb21 commented 1 month ago

Okay, that's why then. Can you go into the directory where you downloaded colora and type tree ., please?

SolayMane commented 4 weeks ago

I run the test and I got several errors... actually I'm on this error :

host: inra
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 56
Rules claiming more threads will be scaled down.
Job stats:
job         count
--------  -------
all             1
busco           1
nanoplot        1
quast           1
yahs            2
total           6

Select jobs to execute...
Execute 3 jobs...

[Mon Oct 14 10:21:17 2024]
localrule yahs:
    input: results/bwa_index_hap2/asm.fa, results/arima_mapping_pipeline_hap2/REP_DIR/paired_mark_dups_final.bam
    output: results/yahs_hap2/asm_yahs_scaffolds_final.fa, results/assemblies/yahs_hap2.fa
    log: logs/yahs_hap2.log
    jobid: 14
    reason: Missing output files: results/assemblies/yahs_hap2.fa, results/yahs_hap2/asm_yahs_scaffolds_final.fa
    wildcards: hap=hap2
    resources: tmpdir=/tmp, mem_mb=8000, mem_mib=7630

Activating conda environment: .snakemake/conda/87af61d5b13247599818da2121007351_

[Mon Oct 14 10:21:17 2024]
localrule yahs:
    input: results/bwa_index_hap1/asm.fa, results/arima_mapping_pipeline_hap1/REP_DIR/paired_mark_dups_final.bam
    output: results/yahs_hap1/asm_yahs_scaffolds_final.fa, results/assemblies/yahs_hap1.fa
    log: logs/yahs_hap1.log
    jobid: 8
    reason: Missing output files: results/assemblies/yahs_hap1.fa, results/yahs_hap1/asm_yahs_scaffolds_final.fa
    wildcards: hap=hap1
    resources: tmpdir=/tmp, mem_mb=8000, mem_mib=7630

Activating conda environment: .snakemake/conda/87af61d5b13247599818da2121007351_

[Mon Oct 14 10:21:17 2024]
localrule nanoplot:
    input: results/reads/hifi/hifi.fastq.gz
    output: results/nanoplot/NanoPlot-report.html
    log: logs/nanoplot.log
    jobid: 1
    reason: Missing output files: results/nanoplot/NanoPlot-report.html
    threads: 4
    resources: tmpdir=/tmp, mem_mb=8000, mem_mib=7630

Activating conda environment: .snakemake/conda/5d695729f424d26c27968862726d6580_
[Mon Oct 14 10:21:18 2024]
Finished job 14.
1 of 6 steps (17%) done
[Mon Oct 14 10:21:18 2024]
Finished job 8.
2 of 6 steps (33%) done
Select jobs to execute...
Execute 2 jobs...

[Mon Oct 14 10:21:18 2024]
localrule busco:
    input: results/assemblies/asm_hap1.fa, results/assemblies/yahs_hap1.fa, results/assemblies/asm_hap2.fa, results/assemblies/yahs_hap2.fa
    output: results/busco
    log: logs/busco.log
    jobid: 20
    reason: Missing output files: results/busco; Input files updated by another job: results/assemblies/yahs_hap1.fa, results/assemblies/yahs_hap2.fa
    threads: 4
    resources: tmpdir=/tmp, mem_mb=32000, mem_mib=30518

Activating conda environment: .snakemake/conda/76be1be47196d1e717e47c9f575fe602_

[Mon Oct 14 10:21:18 2024]
localrule quast:
    input: results/assemblies/asm_hap1.fa, results/assemblies/yahs_hap1.fa, results/assemblies/asm_hap2.fa, results/assemblies/yahs_hap2.fa
    output: results/quast
    log: logs/quast.log
    jobid: 5
    reason: Missing output files: results/quast; Input files updated by another job: results/assemblies/yahs_hap1.fa, results/assemblies/yahs_hap2.fa
    threads: 4
    resources: tmpdir=/tmp, mem_mb=8000, mem_mib=7630

Activating conda environment: .snakemake/conda/3cb2bb5b7d165a8674c17e132df58bb9_
[Mon Oct 14 10:21:27 2024]
Finished job 5.
3 of 6 steps (50%) done
[Mon Oct 14 10:21:39 2024]
Error in rule nanoplot:
    jobid: 1
    input: results/reads/hifi/hifi.fastq.gz
    output: results/nanoplot/NanoPlot-report.html
    log: logs/nanoplot.log (check log file(s) for error details)
    conda-env: /sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_
    shell:

        NanoPlot -t 4 --fastq results/reads/hifi/hifi.fastq.gz --loglength -o results/nanoplot --plots dot --verbose >> logs/nanoplot.log 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job nanoplot since they might be corrupted:
results/nanoplot/NanoPlot-report.html
[Mon Oct 14 10:26:05 2024]
Finished job 20.
4 of 6 steps (67%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-14T102110.537983.snakemake.log
WorkflowError:
At least one job did not complete successfully.

the command NanoPlot -t 4 --fastq results/reads/hifi/hifi.fastq.gz --loglength -o results/nanoplot --plots dot --verbose >> logs/nanoplot.log 2>&1 works fine but still raising error.

LiaOb21 commented 4 weeks ago

So, one first thing that you could try is to set conda config --set channel_priority strict from the conda environment. I never got an error with NanoPlot in any of the machines in which I run the workflow, so there might be some issue with how that specific environment is created.

Can you please go to /sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/ and type tree .? I want to check the directory structure.

One more thing, could you check the log of NanoPlot in the logs directory to see what exactly the error is?

However, if only NanoPlot is giving errors, you should have your assembly completed. NanoPlot is used to QC long reads.

SolayMane commented 4 weeks ago

here is the log of nanoplot

2024-10-14 10:44:37,453 NanoPlot 1.43.0 started with arguments Namespace(threads=20, verbose=True, store=False, raw=False, huge=False, outdir='results/nanoplot', no_static=False, prefix='', tsv_stats=False, only_report=False, info_in_report=False, maxlength=None, minlength=None, drop_outliers=False, downsample=None, loglength=True, percentqual=False, alength=False, minqual=None, runtime_until=None, readtype='1D', barcoded=False, no_supplementary=False, color='#4CB391', colormap='Greens', format=['png'], plots=['dot'], legacy=None, listcolors=False, listcolormaps=False, no_N50=False, N50=False, title=None, font_scale=1, dpi=100, hide_stats=False, fastq=['results/reads/hifi/hifi.fastq.gz'], fasta=None, fastq_rich=None, fastq_minimal=None, summary=None, bam=None, ubam=None, cram=None, pickle=None, feather=None, path='results/nanoplot/')
2024-10-14 10:44:37,453 Python version is: 3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:36:51) [GCC 12.4.0]
2024-10-14 10:44:37,467 Nanoget: Starting to collect statistics from plain fastq file.
2024-10-14 10:44:37,467 Nanoget: Decompressing gzipped fastq results/reads/hifi/hifi.fastq.gz
2024-10-14 12:09:30,768 Reduced DataFrame memory usage from 72.1270637512207Mb to 72.1270637512207Mb
2024-10-14 12:09:31,070 Nanoget: Gathered all metrics of 4726911 reads
2024-10-14 12:09:35,157 Calculated statistics
2024-10-14 12:09:35,161 Using sequenced read lengths for plotting.
2024-10-14 12:09:35,399 Using log10 scaled read lengths.
2024-10-14 12:09:35,937 NanoPlot:  Valid color #4CB391.
2024-10-14 12:09:35,937 NanoPlot:  Valid colormap Greens.
2024-10-14 12:09:36,278 NanoPlot:  Creating length plots for Read length.
2024-10-14 12:09:36,283 NanoPlot:  Using 4726911 reads maximum of 68733bp.
2024-10-14 12:09:37,561 Saved results/nanoplot/WeightedHistogramReadlength  as png (or png for --legacy)
2024-10-14 12:09:39,475 Saved results/nanoplot/WeightedLogTransformed_HistogramReadlength  as png (or png for --legacy)
2024-10-14 12:09:40,736 Saved results/nanoplot/Non_weightedHistogramReadlength  as png (or png for --legacy)
2024-10-14 12:09:42,438 Saved results/nanoplot/Non_weightedLogTransformed_HistogramReadlength  as png (or png for --legacy)
2024-10-14 12:09:42,670 A global iterator flag was passed as a per-operand flag to the iterator constructor
Traceback (most recent call last):
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 110, in main
    plots = make_plots(datadf, settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 166, in make_plots
    nanoplotter.length_plots(
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplotter/nanoplotter_main.py", line 510, in length_plots
    yield_by_minimal_length_plot(
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplotter/nanoplotter_main.py", line 559, in yield_by_minimal_length_plot
    df["cumyield_gb"] = df["lengths"].cumsum() / 10**9
                        ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
           ^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/arraylike.py", line 210, in __truediv__
    return self._arith_method(other, operator.truediv)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/series.py", line 6135, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/base.py", line 1382, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/array_ops.py", line 283, in arithmetic_op
    res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/array_ops.py", line 218, in _na_arithmetic_op
    result = func(left, right)
             ^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/computation/expressions.py", line 242, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/computation/expressions.py", line 73, in _evaluate_standard
    return op(a, b)
           ^^^^^^^^
ValueError: A global iterator flag was passed as a per-operand flag to the iterator constructor

If you read this then NanoPlot 1.43.0 has crashed :-(
Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues
If you could include the log file that would be really helpful.
Thanks!

Traceback (most recent call last):
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/bin/NanoPlot", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 110, in main
    plots = make_plots(datadf, settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 166, in make_plots
    nanoplotter.length_plots(
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplotter/nanoplotter_main.py", line 510, in length_plots
    yield_by_minimal_length_plot(
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/nanoplotter/nanoplotter_main.py", line 559, in yield_by_minimal_length_plot
    df["cumyield_gb"] = df["lengths"].cumsum() / 10**9
                        ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
           ^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/arraylike.py", line 210, in __truediv__
    return self._arith_method(other, operator.truediv)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/series.py", line 6135, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/base.py", line 1382, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/array_ops.py", line 283, in arithmetic_op
    res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/ops/array_ops.py", line 218, in _na_arithmetic_op
    result = func(left, right)
             ^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/computation/expressions.py", line 242, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_/lib/python3.12/site-packages/pandas/core/computation/expressions.py", line 73, in _evaluate_standard
    return op(a, b)
           ^^^^^^^^
ValueError: A global iterator flag was passed as a per-operand flag to the iterator constructor
LiaOb21 commented 4 weeks ago

Yeah, it could be a problem with the environment!

You can try this:

rm -r /sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/.snakemake/conda/5d695729f424d26c27968862726d6580_*

conda config --set channel_priority strict

Then, to resume the workflow, go back to /sanhome2/Argania_assembly/ChAssembly24/Assembly_Colora/ and use the same snakemake command that you used to start the workflow originally. It should download NanoPlot again and create a new environment for that package. Let me know if in this way you manage to complete it.