ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
254 stars 33 forks source link

Search the Zoonotic Reservoir for SARS-CoV-2 #55

Closed ababaian closed 4 years ago

ababaian commented 4 years ago

Batch 1: Searching the Zoonotic Reservoir for SARS-CoV-2

See this preprint.

Steps

% open-ended and may require additional time.

I think we can get this batch done in a few days after the AWS credits come in.

Deadline

First set of credits are expected ~May 1st. This should be ready to go on that date.

KristinaGagalova commented 4 years ago

Starting from the S1 Table in the paper Liu et al., 2020, I've downloaded the list of species (excluding Homo sapiens) that are possibly susceptible to Sars-Covid. ListSpecies.ACE2receptor2.txt Tax IDs of species - tax_report2.txt

Using the following script and the esearch/efetch utilities

#!/bin/sh

set -u
TAXLIST=("Pan troglodytes" "Pan paniscus" "Gorilla gorilla gorilla" "Nomascus leucogenys" "Pongo abelii" "Macaca mulatta" "Macaca fascicularis" "Macaca nemestrina" "Cercocebus atys" "Mandrillus leucophaeus" "Papio anubis" "Theropithecus gelada" "Chlorocebus sabaeus" "Rhinopithecus roxellana" "Piliocolobus tephrosceles" "Callithrix jacchus" "Sapajus apella" "Cebus capucinus imitator" "Aotus nancymaae" "Saimiri boliviensis boliviensis" "Propithecus coquereli" "Oryctolagus cuniculus" "Ochotona princeps" "Mesocricetus auratus" "Cricetulus griseus" "Peromyscus leucopus" "Peromyscus maniculatus bairdii" "Jaculus jaculus" "Ictidomys tridecemlineatus" "Sus scrofa" "Globicephala melas" "Lagenorhynchus obliquidens" "Orcinus orca" "Tursiops truncatus" "Delphinapterus leucas" "Monodon monoceros" "Neophocaena asiaeorientalis asiaeorientalis" "Lipotes vexillifer" "Physeter catodon" "Balaenoptera acutorostrata scammoni" "Bos taurus" "Bos indicus" "Bos indicus x Bos taurus" "Bison bison bison" "Odocoileus virginianus texanus" "Bos mutus" "Bubalus bubalis" "Ovis aries" "Capra hircus" "Rousettus aegyptiacus" "Vombatus ursinus" "Phascolarctos cinereus" "Trichechus manatus latirostris" "Equus caballus" "Equus przewalskii" "Equus asinus" "Ceratotherium simum simum" "Canis lupus familiaris" "Canis lupus dingo" "Vulpes vulpes" "Ailuropoda melanoleuca" "Ursus maritimus" "Ursus arctos horribilis" "Zalophus californianus" "Eumetopias jubatus" "Callorhinus ursinus" "Odobenus rosmarus divergens" "Phoca vitulina" "Neomonachus schauinslandi" "Mustela putorius furo" "Mustela erminea" "Enhydra lutris kenyoni" "Felis catus" "Lynx canadensis" "Acinonyx jubatus" "Puma concolor" "Panthera pardus" "Panthera tigris altaica" "Manis javanica")
#TAXLIST=("Gorilla gorilla gorilla" "Nomascus leucogenys")
#TAXLIST=$1

for TAX in "${TAXLIST[@]}" ; do
nam=$(echo $TAX | sed 's/ /_/g')
echo getting RunInfo for: $TAX
   GENOME=$(esearch -db sra -query "$TAX"[orgn] |
       efetch -format runinfo | tee ${nam}.esearch.RunInfo.all.txt)
echo "processing ${nam}"
grep "TRANSCRIPTOMIC" ${nam}.esearch.RunInfo.all.txt > ${nam}.esearch.RunInfo.out
#rm "${TAX}.esearch.RunInfo.tmp"
done

I have downloaded the SRA InfoTable for all the species, selecting by "TRANSCRIPTOMICS" features.

Data - All downloaded from SRA based on species All.Species.esearch.RunInfo.allSRA.tar.gz

Data - All downloaded from SRA based on species && filtered by "TRANSCRIPTOMIC" label All.Species.esearch.RunInfo.tar.gz

Number of SRA/SRR/ERR entries for Downloaded data && TRANSCRIPTOMIC (total: 73,813)

16752 Bos_taurus.esearch.RunInfo.out
10877 Sus_scrofa.esearch.RunInfo.out
10481 Macaca_mulatta.esearch.RunInfo.out
7447 Ovis_aries.esearch.RunInfo.out
6828 Macaca_fascicularis.esearch.RunInfo.out
3352 Pan_troglodytes.esearch.RunInfo.out
3119 Canis_lupus_familiaris.esearch.RunInfo.out
2659 Bubalus_bubalis.esearch.RunInfo.out
2409 Equus_caballus.esearch.RunInfo.out
1358 Capra_hircus.esearch.RunInfo.out
1157 Papio_anubis.esearch.RunInfo.out
1097 Oryctolagus_cuniculus.esearch.RunInfo.out
1049 Cricetulus_griseus.esearch.RunInfo.out
944 Callithrix_jacchus.esearch.RunInfo.out
804 Chlorocebus_sabaeus.esearch.RunInfo.out
518 Mustela_putorius_furo.esearch.RunInfo.out
402 Tursiops_truncatus.esearch.RunInfo.out
324 Felis_catus.esearch.RunInfo.out
227 Pan_paniscus.esearch.RunInfo.out
222 Mesocricetus_auratus.esearch.RunInfo.out
181 Phascolarctos_cinereus.esearch.RunInfo.out
181 Pongo_abelii.esearch.RunInfo.out
175 Peromyscus_leucopus.esearch.RunInfo.out
158 Rousettus_aegyptiacus.esearch.RunInfo.out
142 Ictidomys_tridecemlineatus.esearch.RunInfo.out
139 Bos_indicus_x_Bos_taurus.esearch.RunInfo.out
130 Macaca_nemestrina.esearch.RunInfo.out
116 Piliocolobus_tephrosceles.esearch.RunInfo.out
84 Equus_asinus.esearch.RunInfo.out
70 Vulpes_vulpes.esearch.RunInfo.out
67 Cercocebus_atys.esearch.RunInfo.out
62 Ailuropoda_melanoleuca.esearch.RunInfo.out
51 Delphinapterus_leucas.esearch.RunInfo.out
33 Manis_javanica.esearch.RunInfo.out
25 Rhinopithecus_roxellana.esearch.RunInfo.out
20 Ursus_maritimus.esearch.RunInfo.out
16 Saimiri_boliviensis_boliviensis.esearch.RunInfo.out
15 Eumetopias_jubatus.esearch.RunInfo.out
14 Aotus_nancymaae.esearch.RunInfo.out
12 Physeter_catodon.esearch.RunInfo.out
12 Vombatus_ursinus.esearch.RunInfo.out
11 Acinonyx_jubatus.esearch.RunInfo.out
11 Trichechus_manatus_latirostris.esearch.RunInfo.out
11 Zalophus_californianus.esearch.RunInfo.out
9 Gorilla_gorilla_gorilla.esearch.RunInfo.out
6 Propithecus_coquereli.esearch.RunInfo.out
4 Enhydra_lutris_kenyoni.esearch.RunInfo.out
4 Monodon_monoceros.esearch.RunInfo.out
4 Ochotona_princeps.esearch.RunInfo.out
4 Panthera_tigris_altaica.esearch.RunInfo.out
3 Canis_lupus_dingo.esearch.RunInfo.out
2 Globicephala_melas.esearch.RunInfo.out
2 Phoca_vitulina.esearch.RunInfo.out
2 Puma_concolor.esearch.RunInfo.out
2 Sapajus_apella.esearch.RunInfo.out
2 Theropithecus_gelada.esearch.RunInfo.out
1 Bos_mutus.esearch.RunInfo.out
1 Cebus_capucinus_imitator.esearch.RunInfo.out
1 Lynx_canadensis.esearch.RunInfo.out
1 Mandrillus_leucophaeus.esearch.RunInfo.out
1 Neomonachus_schauinslandi.esearch.RunInfo.out
1 Orcinus_orca.esearch.RunInfo.out
1 Peromyscus_maniculatus_bairdii.esearch.RunInfo.out
0 Balaenoptera_acutorostrata_scammoni.esearch.RunInfo.out
0 Bison_bison_bison.esearch.RunInfo.out
0 Bos_indicus.esearch.RunInfo.out
0 Callorhinus_ursinus.esearch.RunInfo.out
0 Ceratotherium_simum_simum.esearch.RunInfo.out
0 Equus_przewalskii.esearch.RunInfo.out
0 Jaculus_jaculus.esearch.RunInfo.out
0 Lagenorhynchus_obliquidens.esearch.RunInfo.out
0 Lipotes_vexillifer.esearch.RunInfo.out
0 Mustela_erminea.esearch.RunInfo.out
0 Neophocaena_asiaeorientalis_asiaeorientalis.esearch.RunInfo.out
0 Nomascus_leucogenys.esearch.RunInfo.out
0 Odobenus_rosmarus_divergens.esearch.RunInfo.out
0 Odocoileus_virginianus_texanus.esearch.RunInfo.out
0 Panthera_pardus.esearch.RunInfo.out
0 Ursus_arctos_horribilis.esearch.RunInfo.out
ababaian commented 4 years ago

Experiment is now complete. Data is currently on s3://serratus-public/out/200505_zoonotic/ See notebook entries from that date for details of the run.