Arcadia-Science / 2023-amblyomma-americanum-txome-assembly

MIT License
0 stars 0 forks source link

Try otherfuser transcriptome deduplication approach #5

Closed taylorreiter closed 11 months ago

taylorreiter commented 1 year ago

I'm sorry this PR is horribly long -- it's one unit of work, in that it's all the steps to combine and deduplicate the transcriptomes, but it's a lot of steps. It's based on the orthofuser approach from the Oyster River Protocol for de novo transcriptomics.

The tl;dr is that it runs orthofinder to find orthogroups of contigs in each assembled transcriptome, and then runs transrate to get contig scores. It then selects the contig with the highest score from each orthogroup to retain in the final transcriptome. There's a bunch of diamond steps also thrown in -- the people who used the ORP noticed drop out of real contigs, so after deduplicating it goes back to the original transcriptomes and annotates with diamond. Any new annotation from this approach is "rescued."

The PR adds everything in blue. dag