EvolBioInf / fur

Find Unique genomic Regions
29 stars 3 forks source link

When there is a lot of contigs, makeFurDb is very slow #14

Open wangzhichao1990 opened 11 months ago

wangzhichao1990 commented 11 months ago

Hi,

When there are a lot of contigs, makeFurDb is very slow. The following figure shows the statistical results of the neighbor genomes. 图片 Is there a way to increase speed? I am using the latest docker version. Thanks.

haubold commented 11 months ago

To a first approximation, each neighbor sequence is turned into a suffix array. Since the computation of a suffix array comes with a performance overhead, the analysis of very many sequences in the neighborhood will slow down makeFurDb. One way to speet things up is to concatenate sequences into fewer, longer chunks. In the limit of concatenating all neighbors into one sequence, memory consumption is maximal and might outstrip the avable RAM. Hope this helps.