Closed ArtPoon closed 5 years ago
I'm prohibited from analyzing the superinfection data set (111848.fasta) using RAPR because RAPR doesn't allow more than 800 sequences to be processed at once.
" Sorry! We could not process your request. The number of sequences in you input (=1033) exceeded the allowed maximum number of sequences (=800). "
Do you know of any alternative programs I could use for recombination detection?
Try running it with 800 and see how the output looks
I finally got RAPR to work after running into problems. I used a subset of 600 randomly sampled sequences from patient 111848
.
For context, I specified two consensus sequences that were generated from the two different populations. These two populations were divided by manually selecting the sequences in the alignment (and verified by checking the populations with a test tree).
Over 50% of the sequences were recombinant hits (307/600) according to RAPR. Seems like this is too high. This is the link to the result on LANL.
And this shows the two lineages in the tree:
What are your thoughts? I downloaded and installed RDP4 while I was struggling to get RAPR working. Let me know if you'd like me to try that instead.
Just a side note, 111848
was found to have 7 T/F viruses when I examined the paper further. Not sure if this might be affecting the result.
Out of curiosity, I installed and ran RDP4 on the full alignment file (1031 sequences + 2 consensus) and got this as a result:
However, I'm unsure whether I formatted the data in the analysis correctly.
RDP4 analysis was used as a general guideline to find sequences worth investigating. I still relied on manual screening of the phylogenetic tree containing all sequences. I removed all tips falling along the longest branch that separated the two distinct populations and recorded them in a filtering document.
Large data sets like 111848 can be split into two or more subsets that each correspond to a different transmitted founder variant.