Open ArtPoon opened 3 weeks ago
@WilliamZekaiWang - I think we just have to change how combinations
is being generated, based on an additional argument to the TripletGenerator
class constructor:
https://github.com/PoonLab/OpenRDP/blob/0a899b05e16a29a8d79cf836ae1f4cd42c3374f5/openrdp/common.py#L209-L217
I think I've got something? I made a new argument for argsparse -l
that lets users specify a sequence name in the reference sequence to always be included:
>openrdp test.fasta -r ref.fasta -l Sequence_7
Method Start End Recombinant Parent1 Parent2 Pvalue
------------------------------------------------------------------------
Geneconv 3 7 Sequence_2 Sequence_4 - 3.92E-03
Geneconv 12 52 Sequence_3 Sequence_4 - 4.86E-01
Bootscan 100 120 Sequence_5 Sequence_6 Sequence_7 5.74E-06
Bootscan 100 120 Sequence_5 Sequence_7 Sequence_8 2.15E-06
Bootscan 100 120 Sequence_7 Sequence_5 Sequence_6 2.15E-06
Bootscan 100 120 Sequence_8 Sequence_4 Sequence_7 4.14E-29
3Seq 7 52 Sequence_4 Sequence_1 Sequence_2 4.12E-02
3Seq
does not use our TripletGenerator
and so, I'm not too sure on how to incorporate these changes to it
Lets see if there is an option in 3Seq to specify a sequence - https://mol.ax/content/media/2018/02/3seq_manual.20180209.pdf
It might be worth isolating the core 3Seq algorithm and then hooking that C code into Python so that we can feed inputs directly.
3Seq includes the following command line options:
./3seq.macOS -triplet <seq_file> <P_name> <Q_name> <C_name> [options]
./3seq.macOS -single <seq_file> [options]
./3seq.macOS -single <parent_file> <child_file> [options]
./3seq.macOS -full <seq_file> [-ptable ptable_file] [options]
./3seq.macOS -full <parent_file> <child_file> [-ptable ptable_file] [options]
We could pass triplets from TripletGenerator individually in -triplet
mode, but this might be inefficient
Rather than screening the entire input alignment (default behaviour), or a second set of sequences (
-r
option) for potential parent sequences, this new option would enable the user to specify a specific sequence as one of the parents, and any of the other sequences in the input alignment as the second parent. This would reduce the complexity from quadratic to linear.