Plant small RNA target prediction tool
TargetFinder will computationally predict small RNA binding sites on target transcripts from a sequence database. This is done by aligning the input small RNA sequence against all transcripts, followed by site scoring using a position-weighted scoring matrix.
-s
-d
-q
-c
-t
-p
-r Search reverse strand for targets?. Use this option if the database is genomic DNA.
-h Shows the help menu.
targetfinder.pl writes all output to the terminal (STOUT). To save the output to a file use '>' to redirect output to a file.
example:
./targetfinder.pl -s UGCCAAAGGAGAUUUGCCCUG -d arab_cdna -q miR399a > miR399a_predicted_targets.txt
Each predicted target site is printed out separately. The output consists of two parts. The first is a description line and the second is a base-pairing diagram of the target and small RNA (query) sequence. The description line contains the query name (query=name), the description line from the target sequence database (target=target description), and the target prediction score (score=prediction score).
example:
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=1.5
The base-pairing diagram has the target site sequence on top in 5'-3' orientation and the query sequence on the bottom in 3'-5' orientation. Between the target site sequece and the query sequence are base pair symbols. A ":" (colon) symbol represents an ordinary Watson-Crick base pair, a "." (period) represents a G:U base pair, and a " " (space) represents a mismatch.
example:
target 5' UAGGGCAAAUCUUCUUUGGCA 3'
.:::::::::::.::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
If a small RNA is predicted to target a sequence more than once, each target site will be output as separate output. Below is an example of output for miR399a and its target AT2G33770. miR399a has five target sites in the 5'UTR of AT2G33770.
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=1.5
target 5' UAGGGCAAAUCUUCUUUGGCA 3'
.:::::::::::.::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=1.5
target 5' UAGGGCAUAUCUCCUUUGGCA 3'
.:::::: :::::::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=1.5
target 5' UAGAGCAAAUCUCCUUUGGCA 3'
.:: :::::::::::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=1.5
target 5' UUGGGCAAAUCUCCUUUGGCA 3'
. :::::::::::::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
query=miR399a, target=AT2G33770.1 | Symbol: None | ubiquitin-conjugating enzyme family protein, low similarity to u, score=2.5
target 5' UCGAGCAAAUCUCCUUUGGCA 3'
. : :::::::::::::::::
query 3' GUCCCGUUUAGAGGAAACCGU 5'
In addition to the output described above ('classic' output), three new output format options were added to TargetFinder.
Generic Feature Format (GFF3):
./targetfinder.pl -s UGCCAAAGGAGAUUUGCCCUG -d arab_cdna -q miR399a -p gff > miR399a_predicted_targets.gff3
AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN targetfinder rna_target 607 627 1.5 + . smallRNA=miR399a;target_seq=UAGGGCAAAUCUUCUUUGGCA;base_pairs=.:::::::::::.::::::::;miR_seq=GUCCCGUUUAGAGGAAACCGU
AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN targetfinder rna_target 740 760 1.5 + . smallRNA=miR399a;target_seq=UAGGGCAUAUCUCCUUUGGCA;base_pairs=.:::::: :::::::::::::;miR_seq=GUCCCGUUUAGAGGAAACCGU
AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN targetfinder rna_target 829 849 1.5 + . smallRNA=miR399a;target_seq=UUGGGCAAAUCUCCUUUGGCA;base_pairs=. :::::::::::::::::::;miR_seq=GUCCCGUUUAGAGGAAACCGU
AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN targetfinder rna_target 943 963 1.5 + . smallRNA=miR399a;target_seq=UAGAGCAAAUCUCCUUUGGCA;base_pairs=.:: :::::::::::::::::;miR_seq=GUCCCGUUUAGAGGAAACCGU
AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN targetfinder rna_target 886 906 2.5 + . smallRNA=miR399a;target_seq=UCGAGCAAAUCUCCUUUGGCA;base_pairs=. : :::::::::::::::::;miR_seq=GUCCCGUUUAGAGGAAACCGU
Tab-deliminated Format:
miR399a AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN 607 627 + 1.5 UAGGGCAAAUCUUCUUUGGCA .:::::::::::.:::::::: GUCCCGUUUAGAGGAAACCGU
miR399a AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN 740 760 + 1.5 UAGGGCAUAUCUCCUUUGGCA .:::::: ::::::::::::: GUCCCGUUUAGAGGAAACCGU
miR399a AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN 829 849 + 1.5 UUGGGCAAAUCUCCUUUGGCA . ::::::::::::::::::: GUCCCGUUUAGAGGAAACCGU
miR399a AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN 943 963 + 1.5 UAGAGCAAAUCUCCUUUGGCA .:: ::::::::::::::::: GUCCCGUUUAGAGGAAACCGU
miR399a AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN 886 906 + 2.5 UCGAGCAAAUCUCCUUUGGCA . : ::::::::::::::::: GUCCCGUUUAGAGGAAACCGU
JavaScript Object Notation Format (JSON):
{
"miR399a": {
"hits" : [
{
"Target accession": "AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN",
"Score": "1.5",
"Coordinates": "607-627",
"Strand": "+",
"Target sequence": "UAGGGCAAAUCUUCUUUGGCA",
"Base pairing": ".:::::::::::.::::::::",
"amiRNA sequence": "GUCCCGUUUAGAGGAAACCGU"
},
{
"Target accession": "AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN",
"Score": "1.5",
"Coordinates": "740-760",
"Strand": "+",
"Target sequence": "UAGGGCAUAUCUCCUUUGGCA",
"Base pairing": ".:::::: :::::::::::::",
"amiRNA sequence": "GUCCCGUUUAGAGGAAACCGU"
},
{
"Target accession": "AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN",
"Score": "1.5",
"Coordinates": "829-849",
"Strand": "+",
"Target sequence": "UUGGGCAAAUCUCCUUUGGCA",
"Base pairing": ". :::::::::::::::::::",
"amiRNA sequence": "GUCCCGUUUAGAGGAAACCGU"
},
{
"Target accession": "AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN",
"Score": "1.5",
"Coordinates": "943-963",
"Strand": "+",
"Target sequence": "UAGAGCAAAUCUCCUUUGGCA",
"Base pairing": ".:: :::::::::::::::::",
"amiRNA sequence": "GUCCCGUUUAGAGGAAACCGU"
},
{
"Target accession": "AT2G33770.1 | Symbols: UBC24, ATUBC24, PHO2 | phosphate 2 | chr2:14277558-14283040 REVERSE LEN",
"Score": "2.5",
"Coordinates": "886-906",
"Strand": "+",
"Target sequence": "UCGAGCAAAUCUCCUUUGGCA",
"Base pairing": ". : :::::::::::::::::",
"amiRNA sequence": "GUCCCGUUUAGAGGAAACCGU"
}
]
}
}
targetfinder.pl searches for potential miRNA target sites in a FASTA-formated sequence database using three main steps.
SW alignments are used to identify the best complementary regions between the small RNA query sequence and every sequence in the FASTA-formated sequence database. This script runs ssearch35_t with the following settings:
-n Forces the small RNA query sequence to be treated as nucleotide sequence.
-H Suppresses the normal histogram output.
-Q Runs Smith-Waterman search in "quiet" mode.
-f Gap opening penalty (set to -16).
-g Gap extention penalty (set to -10).
-r Match reward/mismatch penalty (set to +15/-10).
-w Alignment output line length (set to 100).
-W Additional sequence context in the output (set to 25).
-E The E-value cutoff (set to 100000).
-i Limits SW alignments to reverse complement matches only.
-U Changes scoring matrix to allow for G:A, T:C, or U:C matches.
ktup Word size for seed matches used to build alignments (set to 1).
SW output is read directly into this script. Each alignment is converted to a RNA duplex by complementing the small RNA query sequence. Each RNA duplex is scored using the following scoring metric and rule set:
Predicted targets are printed out if they are equal to or lower than the cutoff score specified.
Note: the -i option limits SW to reverse complement matches only, but you can use the -r option with targetfinder.pl to search both strands of a sequence database. This should be done if the database is a genome sequence so that target sites on both strands can be found.
Executes parallel TargetFinder jobs using Perl interpreter threads.
-f
-d
-o
-c
-t
-r Search reverse strand for targets?. Use this option if the database is genomic DNA.
-h Shows the help menu.