VDBWRAIR / vartable

0 stars 4 forks source link

Unhandled Compound and Simple location operators #1

Open averagehat opened 5 years ago

averagehat commented 5 years ago

testdata/adeno.gb has the order location attribute, which we don't handle explicitly if it's different form join

There are more, and may be covered in the biopython, scikit-bio, and NCBI documentaiton There may be parsing rules for genbank files. https://github.com/biocore/scikit-bio/blob/master/skbio/io/format/_sequence_feature_vocabulary.py#L202 'join', 'complement', 'order'

averagehat commented 5 years ago

There's a specification here:

http://www.insdc.org/files/feature_table.html

The location operator is a prefix that specifies what must be done to the 
indicated sequence to find or construct the location corresponding to the 
feature. A list of operators is given below with their definitions and most 
common format. 

complement(location) 
Find the complement of the presented sequence in the span specified by "
location" (i.e., read the complement of the presented strand in its 5'-to-3' 
direction) 

join(location,location, ... location) 
The indicated elements should be joined (placed end-to-end) to form one 
contiguous sequence 

order(location,location, ... location) 
The elements can be found in the 
specified order (5' to 3' direction), but nothing is implied about the 
reasonableness about joining them 

Note : location operator "complement" can be used in combination with either "
join" or "order" within the same location; combinations of "join" and "order" 
within the same location (nested operators) are illegal.

We'll have to check if we handle directions 5' 3' correctly. Also

complement(join(2691..4571,4918..5163))
                          Joins regions 2691 to 4571 and 4918 to 5163, then 
                          complements the joined segments (the feature is on the 
                          strand complementary to the presented strand) 

join(complement(4918..5163),complement(2691..4571))
                          Complements regions 4918 to 5163 and 2691 to 4571, then 
                          joins the complemented segments (the feature is on the 
                          strand complementary to the presented strand)