Arcadia-Science / prehgt

A pipeline for lightweight screening of Eukaryotic genomes and transcriptomes for recent HGT
MIT License
12 stars 6 forks source link

over time, develop filtering heuristics to systematically remove false positives that repeatedly crop up #38

Open taylorreiter opened 1 year ago

taylorreiter commented 1 year ago

In the pub, we say, "Over time, we hope to curate a list of genes that the preHGT pipeline frequently detects as false positives and to develop a strategy to filter them out."

Originally i had thought of filtering out by annotation name. @jonathaneisen suggested that we could create a BLAST database and filter out by sequence similarity. I think this is a much better approach than going by name, wanted to record here and to continue brainstorming about potential strategies.