data-fun / 3d-genome-builder

3DGB is a workflow to build 3D models of genomes from HiC data
BSD 3-Clause "New" or "Revised" License
13 stars 3 forks source link

pandas.errors.UndefinedVariableError: name 'nan' is not defined #13

Open allaigle opened 1 month ago

allaigle commented 1 month ago

Hi,

Thank you so much for having created such a nice workflow!!

Depending on the species I am running it, I get this error, sometimes for every resolution, sometimes only for one:

Job 49: Fix inverted contigs (if needed) in the 3D structure at resolution 10000
Reason: Missing output files: structure/10000/structure_verified_contigs.pdb; Input files updated by another job: structure/10000/structure_with_chr.pdb

python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1 
Activating conda environment: .snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_
[Sat Oct 19 06:10:13 2024]
Error in rule verify_inverted_contigs:
    jobid: 49
    input: structure/10000/structure_with_chr.pdb, genome.fasta
    output: structure/10000/structure_verified_contigs.pdb, sequence/10000/genome_verified_contigs.fasta
    log: logs/verify_inverted_contigs_10000.log (check log file(s) for error details)
    conda-env: /PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_
    shell:
        python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1 
[...]
Looking for inverted contigs into chromosome nan
Traceback (most recent call last):
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 231, in resolve
    return self.resolvers[key]
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/collections/__init__.py", line 941, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/collections/__init__.py", line 933, in __missing__
    raise KeyError(key)
KeyError: 'nan'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 242, in resolve
    return self.temps[key]
KeyError: 'nan'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/PATH/WORKDIR/../scripts/verify_inverted_contigs.py", line 377, in <module>
    INVERTED_CONTIGS = find_inverted_contigs(
  File "/PATH/WORKDIR/../scripts/verify_inverted_contigs.py", line 186, in find_inverted_contigs
    chromosome_df = structure_df.query(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/frame.py", line 4823, in query
    res = self.eval(expr, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/frame.py", line 4949, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/eval.py", line 336, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 809, in __init__
    self.terms = self.parse()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 828, in parse
    return self._visitor.visit(self.expr)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 418, in visit_Module
    return self.visit(expr, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 421, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 719, in visit_Compare
    return self.visit(binop)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 532, in visit_BinOp
    op, op_class, left, right = self._maybe_transform_eq_ne(node)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 454, in _maybe_transform_eq_ne
    right = self.visit(node.right, side="right")
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 545, in visit_Name
    return self.term_type(node.id, self.env, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/ops.py", line 91, in __init__
    self._value = self._resolve_name()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/ops.py", line 115, in _resolve_name
    res = self.env.resolve(local_name, is_local=is_local)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 244, in resolve
    raise UndefinedVariableError(key, is_local) from err
pandas.errors.UndefinedVariableError: name 'nan' is not defined
================================================================================

Exiting because a job execution failed. Look above for error message
Complete log: ../.snakemake/log/2024-10-18T101701.614686.snakemake.log

Note that when I tested python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1, I got no such error.

I also tried

cd $3DGB
vim ./scripts/verify_inverted_contigs.py 
from numpy import nan # right after "import numpy as np"

but when rerunning the pipeline, I got the same issue.

Is there any "easy" solution to fix it like adding an import command?

Thanks!

allaigle commented 1 month ago

Also, when not having an issue with pandas.errors.UndefinedVariableError: name 'nan' is not defined, I sometimes get the following one:

Number of beads read from structure: 1847
Number of beads deduced from sequence and HiC resolution: 1847
Traceback (most recent call last):
  File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 196, in <module>
    assign_chromosome_number(ARGS.pdb, CHROMOSOME_LENGTH, ARGS.resolution, ARGS.output)
  File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 185, in assign_chromosome_number
    coordinates.to_pdb(path=pdb_name_out, records=None, gz=False, append_newline=True)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/pandas_pdb.py", line 719, in to_pdb
    dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/series.py", line 4917, in apply
    return SeriesApply(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1427, in apply
    return self.apply_standard()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1507, in apply_standard
    mapped = obj._map_values(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/engines.py", line 161, in <lambda>
    if len(str(int(x))) < 3
ValueError: cannot convert float NaN to integer

Is there no way to add something like .dropna() somewhere in scripts/assign_chromosomes.py ?

Thank you for your help !

allaigle commented 4 weeks ago

Might be useful for someone: issues are arising only when species have scaffolds in their genome.fasta. Otherwise, it works fine.

gaellelelandais commented 4 weeks ago

Thank you for your feedback and comments @allaigle. @pierrepo will take a look at your message as soon as possible. Thank you for your understanding.

pierrepo commented 3 weeks ago

Hey @allaigle

These errors could be related to your genome definition (genome.fasta). If it's an already published genome would you mind sharing a link we could have a look at? If it is not yet published, would you mind characterizing it a little bit: number of chromosomes/contigs, number of bases per chromosome/contig... ?

Also, could you try running the workflow with the verify_contigs option as False in your config file (although I guess you might be very interested with this option being activated).

allaigle commented 2 weeks ago

Hi @pierrepo,

Thank you for your answer ! After this issue, I checked the genome quality using QUAST and decided to remove all contigs/scaffolds shorter than 500kb, which did not changed much the NC50, so I ran everything with this setting, and almost all my species worked fine, except one (for now), which is the genome GCA_006506795.2 (Hericium erinaceus).

Indeed, I ran all of them with verify_contigs option as True but it did not show any issue after trimming.

Thank you for your help, much appreciated!