StuntsPT / Structure_threader

A wrapper program to parallelize and automate runs of "Structure", "fastStructure" and "MavericK".
GNU General Public License v3.0
24 stars 11 forks source link

plink .bed .bim .fam formatting problem #95

Closed alleneli closed 7 months ago

alleneli commented 7 months ago

Hi,

I have SNP data from a bwa-gatk-plink pipeline that is in the format of plink's .fam/.bin/.bed files. The generated logs are leading me to believe that either my plink files are not formatted correctly, or I am improperly calling them in the command line. I'm guessing (and hoping) the latter is true, as the logs change drastically based on whether I call the .bim, .bed, or .fam file. Any assistance is greatly appreciated!

Here is the code I am using:

structure_threader run -K 5 -R 5 -i plink.bim -o output -t 5 -st /big-disk/home/eja56/.local/bin/structure

The logs all look like this:


----------------------------------------------------
STRUCTURE by Pritchard, Stephens and Donnelly (2000)
     and Falush, Stephens and Pritchard (2003)
       Code by Pritchard, Falush and Hubisz
             Version 2.3.4 (Jul 2012)
----------------------------------------------------

Reading file "mainparams".
datafile is
infile
Reading file "extraparams".
Reading file "/big-disk/home/eja56/Documents/Vargas_lab_2023/Rapid_reads_combined/Abronia_all_reads/STRUCTURE_TEST/plink.bim".

WARNING! Probable error in the input file.  
Individual 1, locus 582;  encountered the following data 
"5328.A.vi" when expecting an integer

WARNING! Probable error in the input file.  
Individual 1, locus 1;  encountered the following data 
"5328.A.vi" when expecting an integer

....
....
....
....

readlociEOF

WARNING:  Unexpected end of input file.  The details of the
input file are set in mainparams.  I ran out of data while reading
the data for individual 3.

----------------------------------
There were errors in the input file (listed above). According to 
"mainparams" the input file should contain one row of markernames with 581 entries,
 96 rows with 583 entries .

There are 581 rows of data in the input file, with an average of 6.00
entries per line.  The following shows the number of entries in each
line of the input file:

# Entries:   Line numbers
        6:   1--581
----------------------------------

Exiting the program due to error(s) listed above.
StuntsPT commented 7 months ago

Hi @alleneli,

There seems to be some confusion here. You are trying to use STRUCTURE wrapped via Structure_threader with .bim / .bed / .fam input files. STRUCTURE, however is not compatible with this format. There are two ways you can solve the issue:

Hope this helps.

Francisco

alleneli commented 7 months ago

Hi Francisco,

Ah, silly mistake! I need to be more careful about reading the doc's. Thank you for the clarification and prompt response. I've been struggling with installing fastStructure for the past week and a half, and ease of structure_threader's installation has been a breath of fresh air.

I'm just a master's student and I've spent the last semester learning how to do SNP variant calling on my own. After your advice I've finally been able to plot my data! Programs like your are immensely helpful to those of us who are still learning bioinformatics. I cannot thank you enough. I will be recommending your program to everyone and of course citing it and fastStructure.

Thank you for making my research possible, Eli

StuntsPT commented 7 months ago

I am truly happy I could be of assistance. Enjoy your research!

Best, Francisco