AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
147 stars 35 forks source link

IndexError: list index out of range #90

Open hollygene opened 3 years ago

hollygene commented 3 years ago

Hi, I'm new to using scoary and am running into an issue. Here is the full error that scoary gives me:

Traceback (most recent call last): File "/home/hcm59/miniconda3/envs/scoary/bin/scoary", line 8, in <module> sys.exit(main()) File "/home/hcm59/miniconda3/envs/scoary/lib/python3.9/site-packages/scoary/methods.py", line 278, in main RES_and_GTC = Setup_results(genedic, traitsdic, args.collapse) File "/home/hcm59/miniconda3/envs/scoary/lib/python3.9/site-packages/scoary/methods.py", line 914, in Setup_results bh_c_p_v[s_p_v[len(s_p_v)-1][0]] = last_bh = s_p_v[len(s_p_v)-1][1] IndexError: list index out of range

It seems to be working prior to this, but stops here and doesn't give any output files. I looked in the methods.py script but couldn't find anything obviously wrong. My data are output from Roary, a phenotype file, both delimited with commas, and a Newick tree file from IQTree.

I found a previous issue that was similar (https://github.com/AdmiralenOla/Scoary/issues/23) but it looks like their problem was that their Roary file was delimited with semicolons, but I'm 99% sure mine is commas.

Any help is appreciated! I can send example files too.

Here's the script I used:

scoary -t /path/dog_verified_host_PhenoForScoary.csv \ -g /path/gene_presence_absence_roary.csv \ -o /path \ -n /path/core_gene_alignment.aln-gb.nw \ --delimiter , \ --permute 1000 --threads 10

I'm using scoary in a conda environment that I built on a Linux server. Here are some specifications:

# packages in environment at /home/hcm59/miniconda3/envs/scoary:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
argparse                  1.4.0                    pypi_0    pypi
ca-certificates           2021.4.13            h06a4308_1  
certifi                   2020.12.5        py39h06a4308_0  
ete3                      3.1.2                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
numpy                     1.20.2                   pypi_0    pypi
openssl                   1.1.1k               h27cfd23_0  
pip                       21.0.1           py39h06a4308_0  
python                    3.9.2                hdb3f193_0  
readline                  8.1                  h27cfd23_0  
scipy                     1.6.2                    pypi_0    pypi
scoary                    1.6.16                   pypi_0    pypi
setuptools                52.0.0           py39h06a4308_0  
six                       1.15.0           py39h06a4308_0  
sqlite                    3.35.4               hdfb4753_0  
tk                        8.6.10               hbc83047_0  
tzdata                    2020f                h52ac0ba_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  

Thanks!! -Holly

Update: just found out we had used Panaroo, not Roary, so I will be looking into this and seeing if I can find a solution!!

hollygene commented 2 years ago

Answering my own question as almost a year later I ran into the same error and found my own question (ha!)

Basically the issue is that we only had one value for a particular trait, so Scoary was like "I can't correct for multiple tests since there's only one"

word to the wise: remove any traits that have less than 2 (I guess? tbd) values

sydelstan commented 6 months ago

I have the same error as @hollygene when using the -n option but I am not sure why that is the case @mgalardini @AdmiralenOla

arunprasanna83 commented 6 months ago

I have the same issue. Is there any solution?

Traceback (most recent call last):
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/bin/scoary", line 8, in <module>
    sys.exit(main())
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 301, in main
    delimiter=args.delimiter)
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 1001, in StoreResults
    extracolstoprint, firstcolnames, time, delimiter)
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 1070, in StoreTraitResult
    upgmatree = PruneForMissing(upgmatree, Prunedic[Traitname])
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 723, in PruneForMissing
    tree[0] = PruneForMissing(tree[0], Prunedic)
  File "/ibex/scratch/projects/c2078/conda/mambaforge/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 725, in PruneForMissing
    if isinstance(tree[1], list):
IndexError: list index out of range
hollygene commented 6 months ago

@sydelstan sydelstan and @arunprasanna83 arunprasanna83

Does your data have any phenotype that contains only 1 value? see my above comment about multiple test correction. I think if you ensure that each phenotype category has more than 1 data point, it should be okay. If there's still issues, it might be something else. I'd maybe try running a dummy file with only phenotypes that contain 5+ data points and see if that one works.

Hope this helps/makes sense!