Closed rchikhi closed 3 years ago
IMO we should also post virus macro contigs from the 1k subset, at a minimum union of Virstorter2+ and diamond rdrp1+. This set of contigs included 126,436(!) unique palmprints, most of which are embedded in substantially longer contigs. This was lucky but predictable after the fact because most pp's occur in the big viromes, so a randomly chosen 1k pp's captures them.
macro contigs should already be posted as part of the assembly upload
As I read the documentation ("Assembly-level assembly", as opposed to what exactly non-assembly level non-assembly?), it says these are the CoV+ targets, so if this is true then I think documentation needs to be updated to explain the 1k subset and which SRAs are in that subset.
:+1: this is my evening
Done!
it's staged in
s3://serratus-rayan/lovelywater/man/micro.assemblies.fa
I basically just copieds3://serratus-rayan/pro-assembly/all.contigs.fasta
format is:noting also that all motifator hits are here in case we want to migrate them:
s3://serratus-rayan/pro-assembly/all.contigs.LHF.fasta