PAiewsakun / GRAViTy

18 stars 8 forks source link

GRAViTy 1.1.0

INSTALLATION

Execute the command "sudo pip install ." in the GRAViTy directory that contains the "setup.py" file. All dependencies should be installed for you. Note that this is an ALPHA version of the program, meaning that this collection of scripts likely contains a lot of bugs, and it is still under development… and hence the following disclaimer.

DISCLAIMER

The material embodied in this software is provided to you "as-is", “with all faults”, and without warranty of any kind, express, implied or otherwise, including without limitation, any warranty of fitness for a particular purpose, warranty of non-infringement, or warranties of any kind concerning the safety, suitability, lack of viruses, inaccuracies, or other harmful components of this software. There are inherent dangers in the use of any software, and you are solely responsible for determining whether this software is compatible with your equipment and other software installed on your equipment. You are also solely responsible for the protection of your equipment and backup of your data, and the developers/providers will not be liable for any damages you may suffer in connection with using, modifying, or distributing this software. Without limiting the foregoing, the developers/providers make no warranty that:

Software and its documentation made available here:

The developers/providers assume no responsibility for errors or omissions in the software or documentation available from here.

In no event shall the developers/providers be liable to you or anyone else for any direct, special, incidental, indirect, or consequential damages of any kind, or any damages whatsoever, including without limitation, loss of data, loss of profit, loss of use, savings or revenue, or the claims of third parties, whether or not the developers/providers have been advised of the possibility of such damages and loss, however caused, and on any theory of liability, arising out of or in connection with the possession, use, or performance of this software.

The use of this software is done at your own discretion and risk and with agreement that you will be solely responsible for any damage to your computer system or loss of data that results from such activities. No advice or information, whether oral or written, obtained by you from the developers/providers shall create any warranty for the software.

Running GRAViTy

Two main programs are implemented in GRAViTy: GRAViTy_Pipeline_I and GRAViTy_Pipeline_II. In summary, GRAViTy_Pipeline_I is used construct reference PPHMM and GOM databases, and GRAViTy_Pipeline_II is used to identify and classify your viruses.

GRAViTy_Pipeline_I

Usage

GRAViTy_Pipeline_I \ --GenomeDescTableFile "/PATH/TO/virus_description_table" \ --ShelveDir "/PATH/TO/OUTPUT_DIR" \ --Database "DATABASE" \ --Database_Header "DATABASE_HEADER" \ --TaxoGrouping_Header "TaxoGrouping_Header" \ --GenomeSeqFile "/PATH/TO/SEQ" \ --N_Bootstrap "INT"

Option descriptions

--GenomeDescTableFile = Path to your virus description table. It should be a tab delimited file (.txt), with headers. We recommend using the VMR file by the ICTV as a template. An excel version of VMR can be downloaded from https://talk.ictvonline.org/taxonomy/vmr/. The file should contain at least all of the following columns: "Baltimore Group", "Order", "Family", "Subfamily", "Genus", "Virus name (s)", "Virus GENBANK accession", "Virus sequence complete", and "Genetic code table".

--ShelveDir = Path to a directory that stores all GRAViTy outputs. This is where the PPHMM and GOM databases are stored, together with other outputs.

--Database = GRAViTy will analyse only those that are labelled with DATABASE in the database column in the virus description table. The database column can be specified by using the “--Database_Header” option. If 'none', all entries are analysed. [default: none]

--Database_Header = The header of the database column. Cannot be none if DATABASE is specified. [default: none]

--TaxoGrouping_Header = The header of Taxonomic grouping column. Since GRAViTy mainly focuses on the family taxonomic assignment, the default value is “Family”.

--TaxoGroupingFile = It is possible that the user might want to associate different viruses with different taxonomic assignment levels – family assignments for some, but subfamily or genus assignments for others, for example. To accommodate this, the user can either add a taxonomic grouping column in the virus description table, and use --TaxoGrouping_Header option to specify the column (see --TaxoGrouping_Header). Alternatively, the user can provide a file (with no header) that contains a single column of taxonomic groupings for all viruses in the order that appears in the description table. The user can specify the path to the file using this option. If this option is used, it will override the one specified by --TaxoGrouping_Header. [default: none]

--GenomeSeqFile = Path to the genome sequence file in the GenBank format (*.gb). If the file doesn't exist, GRAViTy will download one for you from the NCBI database using the accession numbers specified in the “Virus GENBANK accession” column in the description table.

--N_Bootstrap = "INT" is the number of bootstrap resampling [default: 10].

For more options, use GRAViTy_Pipeline_I --help.

Output descriptions

Outputs are organised into three directories.

EXAMPLE

GRAViTy_Pipeline_I \ --GenomeDescTableFile "/Test/Data/Ref/VMR_Test_Ref.txt" \ --ShelveDir "/Test/Analysis/Ref/VI" \ --Database "VI" \ --Database_Header "Baltimore Group" \ --TaxoGrouping_Header "Taxonomic grouping" \ --N_Bootstrap 10 \ --GenomeSeqFile "/Test/Data/Ref/GenomeSeqs.VI.gb"

This command analyses reference viruses, whose descriptions are in "/Test/Data/Ref/VMR_Test_Ref.txt". GRAViTy will only perform analysis on viruses labelled “VI” in the “Baltimore Group” column in the virus description table. The assigned taxonomic grouping is provided in the “Taxonomic grouping” column. The associated GenBank file is automatically downloaded by GRAViTy, if not present in the computer, stored at "/Test/Data/Ref/GenomeSeqs.VI.gb". Bootstrapping analysis is to be performed with N = 10. The results will be stored at "/Test/Analysis/Ref/VI".

GRAViTy_Pipeline_II

Usage

GRAViTy_Pipeline_II \ --GenomeDescTableFile_UcfVirus "/PATH/TO/virus_description_table" \ --ShelveDir_UcfVirus "/PATH/TO/OUTPUT_DIR" \ --ShelveDirs_RefVirus "/PATH/TO/REF_DIR_I, /PATH/TO/REF_DIR_II, …" \ --GenomeSeqFile_UcfVirus "/PATH/TO/SEQ" \ --UseUcfVirusPPHMMs "BOOLEAN" \ --GenomeSeqFiles_RefVirus "/PATH/TO/REF_SEQ_I, /PATH/TO/REF_SEQ_II, …" \ --N_Bootstrap "INT"

Option descriptions

--GenomeDescTableFile_UcfVirus = Path to the description table of your viruses. It should be a tab delimited file (.txt), with headers. The file should contain at least all of the following columns: "Baltimore Group", "Order", "Family", "Subfamily", "Genus", "Virus name (s)", "Virus GENBANK accession", "Virus sequence complete", and "Genetic code table".

--ShelveDir_UcfVirus = Path to a directory that stores all GRAViTy outputs.

--ShelveDirs_RefVirus = Path(s) to the shelve director(y/ies) of reference virus(es).

--GenomeSeqFile_UcfVirus = Path to the genome sequences of your viruses in the GenBank format (*.gb). Their sequence identifiers should match those in the “Virus GENBANK accession” column in the description table.

--UseUcfVirusPPHMMs = Annotate reference and unclassified viruses using the PPHMM database derived from unclassified viruses if True. [default: True]

--GenomeSeqFiles_RefVirus = Path(s) to the genome sequence GenBank file(s) of reference viruses. This cannot be 'None' if --UseUcfVirusPPHMMs = True.

--N_Bootstrap = "INT" is the number of bootstrap resampling [default: 10].

For more options, use GRAViTy_Pipeline_I --help.

Output descriptions

Outputs are organised into three directories.

Example

GRAViTy_Pipeline_II \ --GenomeDescTableFile_UcfVirus "Test/Data/Ucf/VMR_test_Ucf.txt" \ --ShelveDir_UcfVirus "Test/Analysis/Ucf/Test_ucf_UseUcfPPHMMs" \ --ShelveDirs_RefVirus "Test/Analysis/Ref/VI, Test/Analysis/Ref/VII" \ --GenomeSeqFile_UcfVirus "Test/Data/Ucf/GenomeSeqs.test_Ucf.gb" \ --GenomeSeqFiles_RefVirus "Test/Data/Ref/GenomeSeqs.VI.gb, Test/Data/Ref/GenomeSeqs.VII.gb" \ --UseUcfVirusPPHMMs True \ --N_Bootstrap 10

This command will analyse your viruses, whose descriptions are in "Test/Data/Ucf/VMR_test_Ucf.txt", and keeps the results at "Test/Analysis/Ucf/Test_ucf_UseUcfPPHMMs". GRAViTy will find the genomes of your viruses at "Test/Data/Ucf/GenomeSeqs.test_Ucf.gb". Two reference GRAViTy databases are used, one at "Test/Analysis/Ref/VI" and the other at "Test/Analysis/Ref/VII". Since UseUcfVirusPPHMMs is True, GRAViTy will update the virus annotations (i.e. the PPHMM and GOM signatures) of both the reference and your viruses by using the PPHMM database derived from your viruses. The genomes of reference viruses can be found at "Test/Data/Ref/GenomeSeqs.VI.gb", and "Test/Data/Ref/GenomeSeqs.VII.gb". Bootstrapping analysis is to be performed with N = 10.