cfe-lab / proviral

0 stars 0 forks source link

Reversed sequences #4

Closed donkirkby closed 3 years ago

donkirkby commented 3 years ago

Reported by @dmacmillan:

We discussed how when we analyze sequences that if we do not find both primers, we will reverse complement the sequence and try to find the primers again. If we do not find the primers after reverse complementing, we will record the sequence as is and the failure as is. If, however, we do not find both primers initially but we do find them after reverse complementing, we then want to record the reverse complemented sequence in any downstream files such as the outcome summary and the *primer_analysis files. We also want to include a column "is_revcomp" or something to that effect to indicate that we had to reverse complement the sequence to find the primers and that the sequence itself has been reverse complemented from what was originally output by micall.

Outstanding items:

donkirkby commented 3 years ago

As part of cfe-lab/MiCall#518, we stopped reporting contigs with the -reversed suffix. Instead, we take any contigs whose best BLAST hit is in the reverse direction, and we reverse complement them before reporting them.

It's not that relevant to this issue, however, as we're supposed to try looking for the primers in the reverse direction whenever we can't find them in the forward direction. It just makes it harder to find a test example.

donkirkby commented 3 years ago

The reason that handling the -reversed suffix doesn't solve all of the reversal problems for this pipeline is that some samples could have inversions. That's when the two ends are reversed, while a large portion in the middle is in the forward direction. BLAST would match the forward direction, but the primers would be reversed.

donkirkby commented 3 years ago

Current plan after discussions with ZB and NK: look for primers only in contigs and conseqs that BLAST to HIV. If I don't find primers in the forward direction, I will look in the reverse complement. The rules for a valid match are unchanged beyond that.