PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

User request: constrain analysis to one parent sequence #94

Open ArtPoon opened 3 weeks ago

ArtPoon commented 3 weeks ago

Rather than screening the entire input alignment (default behaviour), or a second set of sequences (-r option) for potential parent sequences, this new option would enable the user to specify a specific sequence as one of the parents, and any of the other sequences in the input alignment as the second parent. This would reduce the complexity from quadratic to linear.

ArtPoon commented 3 weeks ago

@WilliamZekaiWang - I think we just have to change how combinations is being generated, based on an additional argument to the TripletGenerator class constructor: https://github.com/PoonLab/OpenRDP/blob/0a899b05e16a29a8d79cf836ae1f4cd42c3374f5/openrdp/common.py#L209-L217

WilliamZekaiWang commented 1 week ago

I think I've got something? I made a new argument for argsparse -l that lets users specify a sequence name in the reference sequence to always be included:

>openrdp test.fasta -r ref.fasta -l Sequence_7

Method          Start   End     Recombinant     Parent1 Parent2 Pvalue
------------------------------------------------------------------------
Geneconv        3       7       Sequence_2      Sequence_4      -       3.92E-03
Geneconv        12      52      Sequence_3      Sequence_4      -       4.86E-01
Bootscan        100     120     Sequence_5      Sequence_6      Sequence_7      5.74E-06
Bootscan        100     120     Sequence_5      Sequence_7      Sequence_8      2.15E-06
Bootscan        100     120     Sequence_7      Sequence_5      Sequence_6      2.15E-06
Bootscan        100     120     Sequence_8      Sequence_4      Sequence_7      4.14E-29
3Seq            7       52      Sequence_4      Sequence_1      Sequence_2      4.12E-02

3Seq does not use our TripletGenerator and so, I'm not too sure on how to incorporate these changes to it

GopiGugan commented 1 week ago

Lets see if there is an option in 3Seq to specify a sequence - https://mol.ax/content/media/2018/02/3seq_manual.20180209.pdf

ArtPoon commented 1 week ago

It might be worth isolating the core 3Seq algorithm and then hooking that C code into Python so that we can feed inputs directly.

ArtPoon commented 22 hours ago

3Seq includes the following command line options:

      ./3seq.macOS -triplet  <seq_file>  <P_name>    <Q_name>      <C_name>    [options]

      ./3seq.macOS -single   <seq_file>                 [options]
      ./3seq.macOS -single   <parent_file> <child_file> [options]

      ./3seq.macOS -full     <seq_file>                 [-ptable ptable_file]  [options]
      ./3seq.macOS -full     <parent_file> <child_file> [-ptable ptable_file]  [options]

We could pass triplets from TripletGenerator individually in -triplet mode, but this might be inefficient