chrisquince / DESMAN

De novo Extraction of Strains from MetAgeNomes
Other
69 stars 22 forks source link

Questions regarding complex synthetic mock data. #31

Open kmin940 opened 5 years ago

kmin940 commented 5 years ago

Hi Chris, I want to replicate complex strain mock on my computer. I also want to run complex strain mock that only differs in the kinds of bacteria data used from NCBI data.(I want to use different data with different organisms, other than data from https://complexstrainsim.s3.climb.ac.uk/Strains.tar.gz.) I have 2 questions.

  1. On Complex Strain Mock, I downloaded NCBI strains by wget https://complexstrainsim.s3.climb.ac.uk/Strains.tar.gz. At Strains/Strain_35814/GCF_000598125.1, there were 8 files 1)genomic.cogs, 2)genomic.faa, 3)genomic.fas 4)genomic.fna, 5)genomic.gbff.gz 6)genomic.gff 7)genomic.out 8)genomic_fas_map.uc. From NCBI ftp site, ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/598/125/GCF_000598125.1_gbhBH006, I've downloaded genomic.* and ran prodigal and rpsblast to generate genomic.faa, genomic.fas, genomic.out files. But I don't know how to generate _fas_map.uc files. How do I obtain these files? I also cannot spot SCGs, Select_SCGs, Select_ster7, Cluster7_core_tau.csv, Cluster7_core_tau_map.csv, Cluster7_core_tau_mapU.csv, CountCogs.pl, Hap.txt, IdentH.txt, IdentHG.csv, SCGs.fa, SCGs.gfa, SCGs,tree, strain_map.csv, strain_map_scg.csv and temp.fa. How do I get these files/folders also? And is there a way I can get Strain_35814 file at once, not having to download each accession numbers(GCF_000317335.1, GCF_000341465.1 and so on) individually? Since I want to run DESMAN with different synthetic data with different species, it would be grateful if I can get the full codes to manipulate data to get genomic_fas_map.uc files. Can you give me some codes for getting these files? image image Dragged files are the ones that I am unable to create.
  2. How much memory does it take to run complex strain mock with MEGAHIT? I'm receiving some memory errors.

It would be grateful to get answers to these questions. Thank you very much!