over time, develop filtering heuristics to systematically remove false positives that repeatedly crop up

In the pub, we say, "Over time, we hope to curate a list of genes that the preHGT pipeline frequently detects as false positives and to develop a strategy to filter them out."

Originally i had thought of filtering out by annotation name. @jonathaneisen suggested that we could create a BLAST database and filter out by sequence similarity. I think this is a much better approach than going by name, wanted to record here and to continue brainstorming about potential strategies.

Arcadia-Science / prehgt

over time, develop filtering heuristics to systematically remove false positives that repeatedly crop up #38