ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
250 stars 32 forks source link

migrate RdRP data to lovelywater #251

Closed rchikhi closed 3 years ago

rchikhi commented 3 years ago

it's staged in s3://serratus-rayan/lovelywater/man/micro.assemblies.fa I basically just copied s3://serratus-rayan/pro-assembly/all.contigs.fasta format is:

>SRRxxxx NODE_x_length_y_cov_z
[sequence]
...

noting also that all motifator hits are here in case we want to migrate them: s3://serratus-rayan/pro-assembly/all.contigs.LHF.fasta

rcedgar commented 3 years ago

IMO we should also post virus macro contigs from the 1k subset, at a minimum union of Virstorter2+ and diamond rdrp1+. This set of contigs included 126,436(!) unique palmprints, most of which are embedded in substantially longer contigs. This was lucky but predictable after the fact because most pp's occur in the big viromes, so a randomly chosen 1k pp's captures them.

ababaian commented 3 years ago

macro contigs should already be posted as part of the assembly upload

rcedgar commented 3 years ago

As I read the documentation ("Assembly-level assembly", as opposed to what exactly non-assembly level non-assembly?), it says these are the CoV+ targets, so if this is true then I think documentation needs to be updated to explain the 1k subset and which SRAs are in that subset.

ababaian commented 3 years ago

:+1: this is my evening

ababaian commented 3 years ago

Done!