AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
56 stars 9 forks source link

Training DATA AVAILABILITY #17

Open Nianzhen-GU opened 1 year ago

Nianzhen-GU commented 1 year ago

NCBI databases (RefSeq (35) and Genbank (36), release July 2019) were queried for ‘prokaryotic virus’ and genomes >10 kb in length were retained. In addition, the IMG/VR database (release July 2018) (37) was downloaded, and sequences were limited to a minimum length of 10 kb. For the IMG/VR dataset, VIBRANT (38) (v1.2.1, -virome) and CheckV (39) (v0.6.0) were used to obtain circular and/or complete sequences. The resulting NCBI and IMG/VR datasets were dereplicated by 95% identity using the method described here (–derep_only –derep_id 0.95 –frac 0.70 –method longest) and combined, resulting in a total of 11,881 putatively complete genomes.

I wonder is there any chance that you can provide this processed dataset? I will appreciate it very much. Thank you!