conchoecia / odp

oxford dot plots
GNU General Public License v3.0
130 stars 9 forks source link

Length mismatch: Expected axis has 4 elements, new values have 5 elements #50

Closed alexvasilikop closed 7 months ago

alexvasilikop commented 1 year ago

Hello again,

I tried running the pipeline with a subset of genomes (2) by deactivating the legality check for duplicate sequences (I don't know if this is related though).

I get the following error probably related to plotting?

snakemake --snakefile odp/scripts/odp --cores 10
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job stats:
job                               count    min threads    max threads
------------------------------  -------  -------------  -------------
all                                   1              1              1
analysis_D_and_FET                    1              1              1
diamond_blast                         2              9              9
filtered_D_FET_rbh                    1              1              1
get_chromsize_of_analysis_pair        1              1              1
n_ways_reciprocal_best                1              1              1
plot_synteny_nocolor                  1              1              1
reciprocal_best_hits                  1              1              1
total                                 9              1              9

Select jobs to execute...

[Thu Jul 27 14:43:05 2023]
rule diamond_blast:
    input: odp/db/Adinetavaga_prots.pep, odp/db/Hymenolepismicrostoma_prots.pep, odp/db/Hymenolepismicrostoma_prots.pep.phr, odp/db/Hymenolepismicrostoma_prots.pep.pin, odp/db/Hymenolepismicrostoma_prots.pep.psq, odp/db/dmnd/Hymenolepismicrostoma_prots.dmnd
    output: odp/step0-blastp_results/Adinetavaga_against_Hymenolepismicrostoma.blastp
    jobid: 3
    reason: Missing output files: odp/step0-blastp_results/Adinetavaga_against_Hymenolepismicrostoma.blastp
    wildcards: sample1=Adinetavaga, sample2=Hymenolepismicrostoma
    priority: 1
    threads: 9
    resources: tmpdir=/tmp

[Thu Jul 27 14:43:05 2023]
rule get_chromsize_of_analysis_pair:
    input: /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/assembly/Adinetavaga.fasta, /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/assembly/Hymenolepismicrostoma.fasta, odp/db/input_check/Adinetavaga_pass.txt, odp/db/input_check/Hymenolepismicrostoma_pass.txt
    output: odp/step0-chromsize/analyses/Adinetavaga_Hymenolepismicrostoma.chromsize
    jobid: 13
    reason: Missing output files: odp/step0-chromsize/analyses/Adinetavaga_Hymenolepismicrostoma.chromsize
    wildcards: analysis=Adinetavaga_Hymenolepismicrostoma
    resources: tmpdir=/tmp

diamond v2.1.7.161 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 9
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: odp/step0-blastp_results
#Target sequences to report alignments for: 25
Opening the database...  [0.011s]
Database: odp/db/dmnd/Hymenolepismicrostoma_prots.dmnd (type: Diamond database, sequences: 10139, letters: 5512046)
Block size = 2000000000
Algorithm: Double-indexed
Building query histograms...  [0.326s]
Seeking in database...  [0s]
Loading reference sequences...  [0.031s]
Masking reference...  [0.137s]
Initializing temporary storage...  [0s]
Building reference histograms...  [0.139s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array...  [0.049s]
Building query seed array...  [0.1s]
Computing hash join...  [0.042s]
Masking low complexity seeds...  [0.027s]
Searching alignments...  [0.025s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array...  [0.056s]
Building query seed array...  [0.113s]
Computing hash join...  [0.038s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array...  [0.093s]
Building query seed array...  [0.165s]
Computing hash join...  [0.027s]
Masking low complexity seeds...  [0.016s]
Searching alignments...  [0.025s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array...  [0.054s]
Building query seed array...  [0.098s]
Computing hash join...  [0.044s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array...  [0.056s]
Building query seed array...  [0.097s]
Computing hash join...  [0.041s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.035s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array...  [0.058s]
Building query seed array...  [0.105s]
Computing hash join...  [0.043s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.031s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array...  [0.063s]
Building query seed array...  [0.116s]
Computing hash join...  [0.044s]
Masking low complexity seeds...  [0.012s]
Searching alignments...  [0.038s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array...  [0.056s]
Building query seed array...  [0.098s]
Computing hash join...  [0.024s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.035s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.003s]
Clearing query masking...  [0.002s]
Computing alignments... Loading trace points...  [0.395s]
Sorting trace points...  [0.025s]
Computing alignments... Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Select jobs to execute...
 [3.918s]
Deallocating buffers...  [0.012s]
Loading trace points...  [0s]
 [4.353s]
Deallocating reference...  [0s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0.001s]
Total time = 7.9s
Reported 35139 pairwise alignments, 35139 HSPs.
8515 queries aligned.
[Thu Jul 27 14:43:13 2023]
Finished job 3.
1 of 9 steps (11%) done
Select jobs to execute...

[Thu Jul 27 14:43:13 2023]
rule diamond_blast:
    input: odp/db/Hymenolepismicrostoma_prots.pep, odp/db/Adinetavaga_prots.pep, odp/db/Adinetavaga_prots.pep.phr, odp/db/Adinetavaga_prots.pep.pin, odp/db/Adinetavaga_prots.pep.psq, odp/db/dmnd/Adinetavaga_prots.dmnd
    output: odp/step0-blastp_results/Hymenolepismicrostoma_against_Adinetavaga.blastp
    jobid: 9
    reason: Missing output files: odp/step0-blastp_results/Hymenolepismicrostoma_against_Adinetavaga.blastp
    wildcards: sample1=Hymenolepismicrostoma, sample2=Adinetavaga
    priority: 1
    threads: 9
    resources: tmpdir=/tmp

diamond v2.1.7.161 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 9
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: odp/step0-blastp_results
#Target sequences to report alignments for: 25
Opening the database...  [0.027s]
Database: odp/db/dmnd/Adinetavaga_prots.dmnd (type: Diamond database, sequences: 31335, letters: 14194087)
Block size = 2000000000
Algorithm: Double-indexed
Building query histograms...  [0.165s]
Seeking in database...  [0s]
Loading reference sequences...  [0.082s]
Masking reference...  [0.331s]
Initializing temporary storage...  [0s]
Building reference histograms... [Thu Jul 27 14:43:14 2023]
Finished job 13.
2 of 9 steps (22%) done
 [0.34s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array...  [0.118s]
Building query seed array...  [0.057s]
Computing hash join...  [0.024s]
Masking low complexity seeds...  [0.013s]
Searching alignments...  [0.042s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array...  [0.122s]
Building query seed array...  [0.056s]
Computing hash join...  [0.029s]
Masking low complexity seeds...  [0.025s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array...  [0.126s]
Building query seed array...  [0.059s]
Computing hash join...  [0.024s]
Masking low complexity seeds...  [0.021s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array...  [0.11s]
Building query seed array...  [0.054s]
Computing hash join...  [0.026s]
Masking low complexity seeds...  [0.015s]
Searching alignments...  [0.024s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array...  [0.098s]
Building query seed array...  [0.045s]
Computing hash join...  [0.038s]
Masking low complexity seeds...  [0.023s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array...  [0.113s]
Building query seed array...  [0.057s]
Computing hash join...  [0.026s]
Masking low complexity seeds...  [0.016s]
Searching alignments...  [0.039s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array...  [0.121s]
Building query seed array...  [0.057s]
Computing hash join...  [0.039s]
Masking low complexity seeds...  [0.019s]
Searching alignments...  [0.041s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array...  [0.099s]
Building query seed array...  [0.055s]
Computing hash join...  [0.024s]
Masking low complexity seeds...  [0.013s]
Searching alignments...  [0.023s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.005s]
Clearing query masking...  [0.001s]
Computing alignments... Loading trace points...  [0.395s]
Sorting trace points...  [0.025s]
Computing alignments...  [3.472s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [3.895s]
Deallocating reference...  [0.001s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Total time = 7.468s
Reported 29934 pairwise alignments, 29934 HSPs.
4828 queries aligned.
[Thu Jul 27 14:43:20 2023]
Finished job 9.
3 of 9 steps (33%) done
Select jobs to execute...

[Thu Jul 27 14:43:20 2023]
rule reciprocal_best_hits:
    input: odp/step0-blastp_results/Adinetavaga_against_Hymenolepismicrostoma.blastp, odp/step0-blastp_results/Hymenolepismicrostoma_against_Adinetavaga.blastp
    output: odp/step0-blastp_results/reciprocal_best/Adinetavaga_and_Hymenolepismicrostoma_recip.temp.blastp, odp/step0-blastp_results/reciprocal_best/Hymenolepismicrostoma_and_Adinetavaga_recip.temp.blastp
    jobid: 2
    reason: Missing output files: odp/step0-blastp_results/reciprocal_best/Adinetavaga_and_Hymenolepismicrostoma_recip.temp.blastp; Input files updated by another job: odp/step0-blastp_results/Hymenolepismicrostoma_against_Adinetavaga.blastp, odp/step0-blastp_results/Adinetavaga_against_Hymenolepismicrostoma.blastp
    wildcards: sample1=Adinetavaga, sample2=Hymenolepismicrostoma
    resources: tmpdir=/tmp

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Jul 27 14:43:39 2023]
Finished job 2.
4 of 9 steps (44%) done
Select jobs to execute...

[Thu Jul 27 14:43:39 2023]
rule n_ways_reciprocal_best:
    input: odp/step0-blastp_results/reciprocal_best/Adinetavaga_and_Hymenolepismicrostoma_recip.temp.blastp, /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/chrom_new/Adinetavaga.chrom, /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/chrom_new/Hymenolepismicrostoma.chrom
    output: odp/step0-blastp_results/reciprocal_best/Adinetavaga_Hymenolepismicrostoma_acceptable_prots.txt, odp/step0-blastp_results/reciprocal_best/Adinetavaga_Hymenolepismicrostoma_edges.txt, odp/step1-rbh/Adinetavaga_Hymenolepismicrostoma_reciprocal_best_hits.rbh
    jobid: 1
    reason: Missing output files: odp/step1-rbh/Adinetavaga_Hymenolepismicrostoma_reciprocal_best_hits.rbh; Input files updated by another job: odp/step0-blastp_results/reciprocal_best/Adinetavaga_and_Hymenolepismicrostoma_recip.temp.blastp
    wildcards: analysis=Adinetavaga_Hymenolepismicrostoma
    resources: tmpdir=/tmp

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Jul 27 14:43:43 2023]
Error in rule n_ways_reciprocal_best:
    jobid: 0
    input: odp/step0-blastp_results/reciprocal_best/Adinetavaga_and_Hymenolepismicrostoma_recip.temp.blastp, /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/chrom_new/Hymenolepismicrostoma.chrom, /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/DATASETS/chrom_new/Adinetavaga.chrom
    output: odp/step0-blastp_results/reciprocal_best/Adinetavaga_Hymenolepismicrostoma_acceptable_prots.txt, odp/step0-blastp_results/reciprocal_best/Adinetavaga_Hymenolepismicrostoma_edges.txt, odp/step1-rbh/Adinetavaga_Hymenolepismicrostoma_reciprocal_best_hits.rbh

RuleException:
ValueError in file /mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/odp/scripts/odp, line 422:
Length mismatch: Expected axis has 4 elements, new values have 5 elements
  File "/mnt/sda1/Alex/14.COMPARATIVE_GENOMICS/02.ODP/odp/scripts/odp", line 422, in __rule_n_ways_reciprocal_best
  File "/home/lege/anaconda3/envs/odp/lib/python3.11/site-packages/pandas/core/generic.py", line 6002, in __setattr__
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/home/lege/anaconda3/envs/odp/lib/python3.11/site-packages/pandas/core/generic.py", line 730, in _set_axis
  File "/home/lege/anaconda3/envs/odp/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 225, in set_axis
  File "/home/lege/anaconda3/envs/odp/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
  File "/home/lege/anaconda3/envs/odp/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-27T144301.284040.snakemake.log

Any help would be appreciated Thanks Alex

conchoecia commented 1 year ago

Hi Alex, I have no clue for this one, to be honest! I have never seen this error.

It looks like something caused by the function that builds a graph and finds the reciprocal best blast hits. Did you dig into this at all? I think this would require going through the code and debugging it carefully to find the source of the error.

conchoecia commented 7 months ago

This should be fixed with the recent commit https://github.com/conchoecia/odp/commit/ba6c45375c2bf5e75a9ff0779e5d75b5509209db

Please run git pull then try again. If you experience the same issue please reopen this one, or submit a new issue for a different problem.

Wangyf1234 commented 6 months ago

Because the .chrom file‘s format is incorrect.I just met this problem, and checked my .chrom file.I found that my .chrom file had extra columns.

conchoecia commented 6 months ago

@Wangyf1234 Thank you for the update - I may add something to my function to check input .chrom files to check that the number of columns and the data type in each column is correct.