conchoecia / odp

oxford dot plots
GNU General Public License v3.0
131 stars 9 forks source link

Not supported between instances of 'str' and 'int #60

Closed 1538501175 closed 8 months ago

1538501175 commented 10 months ago

Hello, I attempted to make ribbon diagrams with the result of ODP, but something went wrong. Could it be an issue with the scaffold name in the 'str' type? Here is the log file: bash odp_ribbon.sh Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 28 Rules claiming more threads will be scaled down. Job stats: job count


all 1 make_plot 1 total 2

Select jobs to execute...

[Thu Nov 30 13:21:15 2023] rule make_plot: input: /gpfs/home/zxl/hdx/Primate/Comparative_genomics/Collinearity/odp/v2/odp_original_version/step2-figures/synteny_coloredby_BCnS_LGs/MacacaMulatta_MacacaSinica_xy_reciprocal_best_hits.coloredby_BCnS_LGs.plotted.rbh, sp_to_chr_to_size.tsv output: output.pdf jobid: 1 reason: Missing output files: output.pdf resources: tmpdir=/tmp

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 28 Rules claiming more threads will be scaled down. Select jobs to execute... /gpfs/home/zxl/Software/odp-main/scripts/odp_rbh_to_ribbon:242: DtypeWarning: Columns (4) have mixed types. Specify dtype option on import or set low_memory=False. for thisrbh in rbh_df_list: [Thu Nov 30 13:21:26 2023] Error in rule make_plot: jobid: 0 input: /gpfs/home/zxl/hdx/Primate/Comparative_genomics/Collinearity/odp/v2/odp_original_version/step2-figures/synteny_coloredby_BCnS_LGs/MacacaMulatta_MacacaSinica_xy_reciprocal_best_hits.coloredby_BCnS_LGs.plotted.rbh, sp_to_chr_to_size.tsv output: output.pdf

RuleException: TypeError in file /gpfs/home/zxl/Software/odp-main/scripts/odp_rbh_to_ribbon, line 283: '<' not supported between instances of 'str' and 'int' File "/gpfs/home/zxl/Software/odp-main/scripts/odp_rbh_to_ribbon", line 683, in __rule_make_plot File "/gpfs/home/zxl/Software/odp-main/scripts/odp_rbh_to_ribbon", line 283, in ribbon_plot File "/gpfs/home/zxl/Software/miniconda3/envs/python-3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-30T132100.187987.snakemake.log ############################################## The sp_to_chr_to_size.tsv file are like this: MacacaMulatta 1 223616942 MacacaMulatta 2 196197964 MacacaMulatta 3 185288947 MacacaMulatta 4 169963040 MacacaMulatta 5 187317192 MacacaMulatta 6 179085566 MacacaMulatta 7 169868564 MacacaMulatta 8 145679320 MacacaMulatta 9 134124166 MacacaMulatta 10 99517758 MacacaMulatta 11 133066086 MacacaMulatta 12 130043856 MacacaMulatta 13 108737130 MacacaMulatta 14 128056306 MacacaMulatta 15 113283604 MacacaMulatta 16 79627064 MacacaMulatta 17 95433459 MacacaMulatta 18 74474043 MacacaMulatta 19 58315233 MacacaMulatta 20 77137495 MacacaMulatta X 153388924 MacacaMulatta Y 11753682 MacacaSinica scf00000000 101663551 MacacaSinica scf00000001 135944474 MacacaSinica scf00000002 145055576 MacacaSinica scf00000003 116991941 MacacaSinica scf00000004 130259291 MacacaSinica scf00000005 122760163 MacacaSinica scf00000006 92374030 MacacaSinica scf00000007 100430957 MacacaSinica scf00000008 84247821 MacacaSinica scf00000009 65251504 MacacaSinica scf00000010 239733392 MacacaSinica scf00000011 90492510 MacacaSinica scf00000012 206052853 MacacaSinica scf00000013 204046225 MacacaSinica scf00000014 181208071 MacacaSinica scf00000015 193624151 MacacaSinica scf00000016 186096401 MacacaSinica scf00000017 178686605 MacacaSinica scf00000018 150681066 MacacaSinica scf00000019 144977953 MacacaSinica scf00000020 159893060 MacacaSinica scf00000021 14307032

conchoecia commented 8 months ago

Hello @1538501175 - I believe this was fixed in the recent commit: https://github.com/conchoecia/odp/commit/ba6c45375c2bf5e75a9ff0779e5d75b5509209db

I found that when chromosomes or scaffolds were numbered [0-9] and also didn't contain any alphabetical characters [A-Za-z], that pandas would cast the column as type int. This caused downstream problems. Forcing the column type to be string fixed these issues.

Please reopen this issue if you face the same problem, or a new issue if something else comes up.