ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
251 stars 33 forks source link

Tantalus - R data exploration #131

Closed ababaian closed 4 years ago

ababaian commented 4 years ago

We are now transitioning from pure data-generation to data-analysis, to squeeze as much good information from the output of Serratus we will develop an R package to import the data into a defined and well-behaved S4 Objects. From there we can define a common set of 'tools' for analyzing breaking down this data.

At a high level what we'll want to do is to cross-reference Serratus data as much as possible with other sources of meta-data. Most notably applying efetch to access various NCBI data-bases for things like taxonomy, GenBank meta-data, SRA meta-data etc.

We are at early stages so the first step will be to 'explore' the data for a day or two in a more open ended way and become familiar with the summary files. Then we can discuss and rationally break down and group information in meaningful ways.

Summary files from the current version of Serratus are currently being from ~90K Vertebrate Accessions and can be accessed here: s3://serratus-public/out/200525_vert/summary/

ababaian commented 4 years ago

Tantalus is now well underway