Write a method that calculates enrichments and deficiencies over the given ranked sequences

ozgunbabur commented 2 years ago

Please complete the issues #7 and #8 before working on this one.

The method will take the ranked sequences, calculate all the enrichments and deficiencies, and return the results in a special format.

We will use the following coordinate system on a sequence: The center aa (amino acid) is position 0. This number will increase as we go right and it will decrease as we go left. For instance, if we have window 3, then the coordinates of the positions from left to right will be -3, -2, -1, 0, 1, 2, 3.

Here are the inputs and outputs.

Input: ranked sequences Output: A dictionary of dictionaries that gives enrichment and deficiency p-values for each position and each aa

The method first needs to call the method that calculates the counts of each amino acid at each location (see issue #7). Then, for each location and an aa, it should call the p-value calculation method that returns enrichment and deficiency p-values for one location and one aa (see issue #8). Finally, the results should be arranged as a dictionary of dictionaries.

An example output is below.

{-2: {"A": [0.012, 0.87], "F": [0.54, 0.23]}, -1: {"A": [0.23, 0.07], "P": [0.91, 0.003], "S": [0.6, 0.12]}, ...}

Here, we understand that the enrichment p-value of P on position -1 is 0.91 and its deficiency p-value is 0.003.

AdamFinkleUMB commented 2 years ago

Using an array indexed to the UTF-8 encoding of the letters is easier and faster.

ozgunbabur commented 2 years ago

It may be harder to debug the code that way. But it is your choice. If you implement it as a number array, please also implement the necessary converters between number array and string representations, and test them to make sure they are working properly.

AdamFinkleUMB commented 2 years ago

See issue of one amino acid and one position.

PathwayAndDataAnalysis / Finkle-PHYS-479

Write a method that calculates enrichments and deficiencies over the given ranked sequences #6