StuntsPT / Structure_threader

A wrapper program to parallelize and automate runs of "Structure", "fastStructure" and "MavericK".
GNU General Public License v3.0
24 stars 11 forks source link

ERROR running STRUCTURE in command line #86

Closed cahuparo closed 2 years ago

cahuparo commented 4 years ago


I am trying to run STRUCTURE from the command line using Structure-threader. This is the error that I am getting:

STRUCTURE by Pritchard, Stephens and Donnelly (2000)
     and Falush, Stephens and Pritchard (2003)
       Code by Pritchard, Falush and Hubisz
             Version 2.3.4 (Jul 2012)

Reading file "/gpfs_common/share03/lmquesad/chparada/structure/mainparams".
datafile is
Reading file "/gpfs_common/share03/lmquesad/chparada/structure/extraparams".
Reading file "/gpfs_common/share03/lmquesad/chparada/structure/input_file_NOvarroa.txt".

Data file "/gpfs_common/share03/lmquesad/chparada/structure/input_file_NOvarroa.txt" (truncated) --

Ind:   Label Genotype_data . . . .
  1: KA133 168  -9 108 162  -9 102 136 213  . . . .  92
  1: KA133  -9  -9  -9 182  -9 108 140 222  . . . .  98
  2: KA143  -9  -9 108 162 176 102 136 213  . . . .  92
  2: KA143  -9  -9  -9  -9  -9 108 140 222  . . . .  98
  3: KA144 162 152 108 162 176 102 136 213  . . . .  92
  3: KA144 174  -9  -9 176  -9 108 140 222  . . . .  -9
  4: KA145 162 144 106 162 176  99 136 216  . . . .  92
  4: KA145 174  -9 108 176  -9 102 140 222  . . . .  98


140: BI424 168 146 108 162 176  99 140 213  . . . .  86
140: BI424  -9 152  -9  -9 178 102 149 220  . . . .  92
141: BI425 174 132 106 162 176  91 140 213  . . . .  90
141: BI425 186 162  -9 170  -9 102 149 220  . . . .  92

Number of alleles per locus: min= 4; ave=7.1; max=15
individual KA133 has negative location!  locations should be >= 0

Exiting the program due to error(s) listed above.

This is a few data lines from the input file:

    A107    A29 AP273   AC306   AP55    A24 A88 B124    AP43    AP81    A113    AP66
KA133   168 -9  108 162 -9  102 136 213 131 126 211 092
KA133   -9  -9  -9  182 -9  108 140 222 140 -9  217 098
KA143   -9  -9  108 162 176 102 136 213 131 126 211 092
KA143   -9  -9  -9  -9  -9  108 140 222 140 134 217 098
KA144   162 152 108 162 176 102 136 213 131 126 205 092
KA144   174 -9  -9  176 -9  108 140 222 140 134 217 -9

This is my mainparams file:

FILE extraparams.

"(int)" means that this takes an integer value.
"(B)"   means that this variable is Boolean 
        (ie insert 1 for True, and 0 for False)
"(str)" means that this is a string (but not enclosed in quotes!) 

Basic Program Parameters

#define MAXPOPS    10      // (int) number of populations assumed
#define BURNIN    100000   // (int) length of burnin period
#define NUMREPS   1000000   // (int) number of MCMC reps after burnin

Input/Output files

#define INFILE   infile   // (str) name of input data file
#define OUTFILE  outfile  //(str) name of output data file

Data file format

#define NUMINDS    141    // (int) number of diploid individuals in data file
#define NUMLOCI    12    // (int) number of loci in data file
#define PLOIDY       2    // (int) ploidy of data
#define MISSING     -9    // (int) value given to missing genotype data
#define ONEROWPERIND 0    // (B) store data for individuals in a single line

#define LABEL     1     // (B) Input file contains individual labels
#define POPDATA   0     // (B) Input file contains a population identifier
#define POPFLAG   0     // (B) Input file contains a flag which says 
                              whether to use popinfo when USEPOPINFO==1
#define LOCDATA   0     // (B) Input file contains a location identifier

#define PHENOTYPE 0     // (B) Input file contains phenotype information
#define EXTRACOLS 0     // (int) Number of additional columns of data 
                             before the genotype data start.

#define MARKERNAMES      1  // (B) data file contains row of marker names
#define RECESSIVEALLELES 0  // (B) data file contains dominant markers (eg AFLPs)
                            // and a row to indicate which alleles are recessive
#define MAPDISTANCES     0  // (B) data file contains row of map distances 
                            // between loci

Advanced data file options

#define PHASED           0 // (B) Data are in correct phase (relevant for linkage model only)
#define PHASEINFO        0 // (B) the data for each individual contains a line
                                  indicating phase (linkage model)
#define MARKOVPHASE      0 // (B) the phase info follows a Markov model.
#define NOTAMBIGUOUS  -999 // (int) for use in some analyses of polyploid data

Command line options:

-m mainparams
-e extraparams
-s stratparams
-i input file
-o output file

This is the command I use to run:

structure_threader run -K 10 -R 3 -i /gpfs_common/share03/lmquesad/chparada/structure/input_file_NOvarroa.txt -o /gpfs_common/share03/lmquesad/chparada/structure/ -t 16 -st /usr/local/usrapps/lmquesad/env_structure_threader/bin/structure

Any suggestion would be appreciated.



StuntsPT commented 4 years ago

Hi @cahuparo. Sorry about the delay. I only saw the notification today. It seems like STRUCTURE is trying to use location data. Can you please post the text Structure_threader outputs before the error? Specifically I'm interested in the lines that contain each STRUCTURE command, since your mainparams file seems to be just fine.

StuntsPT commented 2 years ago

It's been almost 2 and a half years. Closing due to inactivity.