AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
147 stars 35 forks source link

Cryptic Python error #54

Closed abremges closed 7 years ago

abremges commented 7 years ago

Hi,

I want to run Scoary on 384 genomes, for which I have 5 antibiotic resistance phenotypes (A,B,C,D,E) and – obviously – the Roary results. However, the following error occurs (macOS 10.12.2, Python 2.7.13, Scoary 1.6.10 installed via pip)

==== Scoary started ====
Reading gene presence absence file
Creating Hamming distance matrix based on gene presence/absence
Building UPGMA tree from distance matrix
Reading traits file
WARNING: Some isolates have missing values for trait C. Missing-value isolates will not be counted in association analysis towards this trait.
ERROR: Some isolates in your gene presence absence file were not represented in your traits file. These will count as MISSING data and will not be included.
Finished loading files into memory.

==== Performing statistics ====
-- Filtration options --
Individual (Naive):    0.05
Collapse genes:    False

Tallying genes and performing statistical analyses
Gene-wise counting and Fisher's exact tests for trait: C
0.00%Traceback (most recent call last):
  File "/usr/local/bin/scoary", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 253, in main
    RES_and_GTC = Setup_results(genedic, traitsdic, args.collapse)
  File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 715, in Setup_results
    stats = Perform_statistics(traitsdic[trait], genedic[gene])
  File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 863, in Perform_statistics
    if int(traits[t]) == 1 and genes[t] == 1:
ValueError: invalid literal for int() with base 10: ''

The log file ends with the step Gene-wise counting and Fisher's exact tests, no further information is given. I am aware of the WARNING (missing data for some traits) and ERROR (more isolates than phenotyping results, for now).


My traits.csv file looks like this:

,A,B,C,D,E
CH2500,1,0,0,0,1
CH2502,NA,0,0,NA,1
...

Cutting the traits.csv into 5 individual files produced 2/5 results, with 3 Scoary runs still failing with the same error message. Any idea what's going on and how to proceed?

Thank you.

AdmiralenOla commented 7 years ago

Hi! (Apologies for the wait - I have been away for a while)

Do you have any empty cells in your trait file? It looks like Scoary is running into blank (i.e. not "NA" or "." or "-") values.

AdmiralenOla commented 7 years ago

Looks like a case where I've forgot to implement an exception for what is presumably a relatively common input error... :-o

AdmiralenOla commented 7 years ago

Fixed in 1.6.11