Open allaigle opened 1 month ago
Also, when not having an issue with pandas.errors.UndefinedVariableError: name 'nan' is not defined
, I sometimes get the following one:
Number of beads read from structure: 1847
Number of beads deduced from sequence and HiC resolution: 1847
Traceback (most recent call last):
File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 196, in <module>
assign_chromosome_number(ARGS.pdb, CHROMOSOME_LENGTH, ARGS.resolution, ARGS.output)
File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 185, in assign_chromosome_number
coordinates.to_pdb(path=pdb_name_out, records=None, gz=False, append_newline=True)
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/pandas_pdb.py", line 719, in to_pdb
dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/series.py", line 4917, in apply
return SeriesApply(
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1427, in apply
return self.apply_standard()
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1507, in apply_standard
mapped = obj._map_values(
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/base.py", line 921, in _map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/algorithms.py", line 1743, in map_array
return lib.map_infer(values, mapper, convert=convert)
File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/engines.py", line 161, in <lambda>
if len(str(int(x))) < 3
ValueError: cannot convert float NaN to integer
Is there no way to add something like .dropna()
somewhere in scripts/assign_chromosomes.py
?
Thank you for your help !
Might be useful for someone: issues are arising only when species have scaffolds in their genome.fasta
. Otherwise, it works fine.
Thank you for your feedback and comments @allaigle. @pierrepo will take a look at your message as soon as possible. Thank you for your understanding.
Hey @allaigle
These errors could be related to your genome definition (genome.fasta
). If it's an already published genome would you mind sharing a link we could have a look at?
If it is not yet published, would you mind characterizing it a little bit: number of chromosomes/contigs, number of bases per chromosome/contig... ?
Also, could you try running the workflow with the verify_contigs
option as False
in your config file (although I guess you might be very interested with this option being activated).
Hi @pierrepo,
Thank you for your answer ! After this issue, I checked the genome quality using QUAST and decided to remove all contigs/scaffolds shorter than 500kb, which did not changed much the NC50, so I ran everything with this setting, and almost all my species worked fine, except one (for now), which is the genome GCA_006506795.2 (Hericium erinaceus).
Indeed, I ran all of them with verify_contigs
option as True
but it did not show any issue after trimming.
Thank you for your help, much appreciated!
Hi,
Thank you so much for having created such a nice workflow!!
Depending on the species I am running it, I get this error, sometimes for every resolution, sometimes only for one:
Note that when I tested
python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1
, I got no such error.I also tried
but when rerunning the pipeline, I got the same issue.
Is there any "easy" solution to fix it like adding an import command?
Thanks!