Filter sequences - Githubissues

ozgunbabur commented 2 years ago

Write a method that will take a ranked sequence and a filter, and return the filtered ranked sequence.

Inputs: Ranked sequence, filter Output: Filtered ranked sequence (a sublist of the original sequence)

The filter parameter will be a dictionary of dictionaries. The key of the dictionary will be the position on the sequence. For each position that is subject to filtering, a dictionary will tell which aa we want there or don't want there. Here is an example.

{-2: {"D": True}, 1: {"J": False, "S": False}}

This filter tells us that we need a D at -2 location, and there shouldn't be any J or S on +1 location.

If we apply the filter on the below ranked sequence

DHFJD DHFGT FGHTU KJFHY DHFHH DJFSP

we should get the this ranked sequence as a result:

DHFGT DHFHH

Note: If you think there is a better data structure for the filter object, please don't hesitate to suggest it.

AdamFinkleUMB commented 2 years ago

A nested tuple would be a faster filter to access.

(
    (-2, "D", True),
    (1, "J", "False, "S", False),
)

We could iterate thus:

def filter_sequences(requirements, sequences):
    filtered_sequences = []
    for sequence in sequences:
        for row in requirements:
            passes_filter = True
            for i in range(1, len(row) - 1, 2):
                if sequence[row[0]] == row[i] != row[i+1]: 
                    passes_filter = False; break
            if passes_filter: filtered_sequences.append(sequence)

ozgunbabur commented 2 years ago

If you want to use a nested list, then you can use a simpler format in each sublist.

[location, positive_or_negative, aa1, aa2, ...]

So that example would become

[
    [-2, True, "D"],
    [1, False, "J", "S"]
]

AdamFinkleUMB commented 2 years ago

Implemented a test for this code.

PathwayAndDataAnalysis / Finkle-PHYS-479

Filter sequences #9