VDBWRAIR / ngs_mapper

Genome Mapping Pipeline
GNU General Public License v2.0
8 stars 10 forks source link

add PCR error filter #251

Open mmelendrez opened 8 years ago

mmelendrez commented 8 years ago

We should look at developing a tool or option into ngsmapper, unless one is already in dev that automates finding the PCR bias in reads. Currently the options are (i) hard crop the reads which could shorten reads to the point of possible not assembling or not assembling correctly (ii) visually picking them out and manually correcting during manual curation. To do (ii) you need to be able to view the BAMs which especially in variant level analysis can get enormous and take forever to load onsite with a good connection - let along over VPN > cluster! My options from here are (i) attempting to use VPN > putty + Xming > cluster > load IGV > load BAMs. Ive I do this it will take hours at best IF VPN doesn't kick me out. (ii) Use geneious, I attempted this - hours to load a BAMs let alone scan through one for the site with potential PCR issues.

A method of identifying such areas where PCR primer issues occur is to look for mutations within 20 bp of the end of a read. Typically - when it's PCR primer bias, mutations are found at the same spot within 20 bp of the end and they are 'stacked' meaning there are a bunch of reads that end at the same coordinate all with the mutation.

Another way of 'seeing' this is there will typically (not always) be a bias of +/- reads. So if there is PCR Primer bias the mutation will be found in 5000+ reads but perhaps only 200- reads (forward/reverse).

For 'real' mutations ideally they need to be found more or less equally on forward and reverse reads, inside the reads as opposed to on the ends.

Thoughts? Ina needs to be brought in for comments on this as well.

mmelendrez commented 8 years ago

considerations per Michael and Mel convo: