alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

estimating JF_SIZE #34

Open devonorourke opened 6 years ago

devonorourke commented 6 years ago

Hi, The readme.md file lists two possible ways to estimate an appropriate value for the JF_SIZE parameter. The first one listed on that page states:

#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000

However a little later on in the document where an example is provided, the way proposed to derive that value is possibly a bit different:

JF_SIZE=2000000000
jellyfish hash size, set this to about 10x the genome size.

I have two questions related to this value:

  1. I have only two kinds of read types to use in my genome assembly: paired-end Illumina data and long-read Nanopore data. If I was to estimate by coverage, should I estimate that according to all data, or just short (or long?) read data?
  2. I'm working with a mammalian genome that is about 2 Gb. Taking into consideration that this is a moderately large genome, is there a minimum amount of memory requirement that should be allocated when the JF_SIZE parameter gets beyond a certain value?

Thanks very much!

alekseyzimin commented 6 years ago

The JF_SIZE parameter controls the memory usage for the error correction. Usually 100genome size is large enough, but then the assembly may run out of RAM. You can try setting it to 10genome size and if the assembly fails, then increase it.