GELOG / adam-ibs

Ports the IBS/MDS/IBD functionality of Plink to Spark / ADAM
Apache License 2.0
3 stars 6 forks source link

List of available command line option of the features implemented in MGL804 #36

Closed ghost closed 9 years ago

ghost commented 9 years ago
iki-v commented 9 years ago

Voici le début de la longue liste des commandes utilisateurs pour commencer : --file suivi de filename, récupère le String filename --bfile suivi de filename, récupère le String filename --out suivi de filename pour output, récupère le String filename --make-bed exécute fonction don le nom sera précisé ultérieurement avec les paramètres issues de --file, --bfile..... ...

ghost commented 9 years ago

L'ensemble des commandes utilisées pour notre projet (à noter les commandes mises dans les tâches "low priority" n'ont pas encore été prise en compte In the command line flag definitions that follow,

Most PLINK runs require exactly one main input fileset. The following flags are available for defining its form and location:

--file {prefix} : Specify .ped + .map filename prefix (default 'plink'). --ped [filename] : Specify full name of .ped file. --map [filename] : Specify full name of .map file.

--bfile {prefix} : Specify .bed + .bim + .fam prefix (default 'plink'). --bed [filename] : Specify full name of .bed file. --bim [filename] : Specify full name of .bim file. --fam [filename] : Specify full name of .fam file.

Output files have names of the form 'plink.{extension}' by default. You can change the 'plink' prefix with

--out [prefix] : Specify prefix for output files.

Most runs also require at least one of the following commands:

--make-bed Create a new binary fileset.  

IBS stratification / clustering

—genome Calculate IBS distances between all individuals —genome -full (j'ai essayé mais ne fonctionne pas)

On peut combiner les options du cluster

—cluster Perform clustering

--cluster <group-avg | old-tiebreaks> Cluster samples using a pairwise similarity statistic (normally IBS). * The 'cc' modifier forces every cluster to have at least one case and one control. * The 'group-avg' modifier causes clusters to be joined based on average instead of minimum pairwise similarity. * The 'missing' modifier causes clustering to be based on identity-by-missingness instead of identity-by-state, and writes a space-delimited identity-by-missingness matrix to disk. * The 'only2' modifier causes only a .cluster2 file (which is valid input for --within) to be written; otherwise 2 other files will be produced. * By default, IBS ties are not broken in the same manner as PLINK 1.07, so final cluster solutions tend to differ. This is generally harmless. However, to simplify testing, you can use the 'old-tiebreaks' modifier to force emulation of the old algorithm.

--ppc [p-val] : Specify minimum PPC test p-value within a cluster. --mc [max size] : Specify maximum cluster size. --mcc [c1] [c2] : Specify maximum case and control counts per cluster. --K [min count] : Specify minimum cluster count. --ibm [val] : Specify minimum identity-by-missingness. --match [f] {mv} : Use covariate values to restrict clustering. Without --match-type, two samples can only be in the same cluster if all covariates match. The optional second parameter specifies a covariate value to treat as missing. --match-type [f] : Refine interpretation of --match file. The --match-type file is expected to be a single line with as many entries as the --match file has covariates; '0' entries specify 'negative matches' (i.e. samples with equal covariate values cannot be in the same cluster), '1' entries specify 'positive matches' (default), and '-1' causes the corresponding covariate to be ignored.

--ppc-gap [val] : Minimum number of base pairs, in thousands, between informative pairs of markers used in --genome PPC test. 500 if unspecified.

--read-genome command can directly read compressed file

--mds-plot [dims] : Multidimensional scaling analysis. Requires —cluster. --within [f] : Specify initial cluster assignments. (http://pngu.mgh.harvard.edu/%7Epurcell/plink/perm.shtml#cluster)

--neighbour [n1] [n2](alias: --neighbor) Report IBS distances from each sample to their n1th- to n2th-nearest neighbors, associated Z-scores, and the identities of those neighbors. Useful for outlier detection.

—matrix Output IBS (similarity) matrix —distance-matrix Output 1-IBS (distance) matrix

—help Display list of options

davidonlaptop commented 9 years ago

—genome -full (j'ai essayé mais ne fonctionne pas)

As-tu essayé --genome-full, sans espace ?

ghost commented 9 years ago

@davidonlaptop oui j'ai essayé avec et sans espace