Using jellyfish count --threads on a distributed network

gmarcais / Jellyfish

A fast multi-threaded k-mer counter

Other

471 stars 136 forks source link

Using jellyfish count --threads on a distributed network #48

Closed SteveGoldstein closed 9 years ago

SteveGoldstein commented 9 years ago

I am running jellyfish on a HTCondor high-throughput distributed computing pool.

For different sequencing projects, I will want to run it with different values for the --threads parameter. For example, for some projects, I might split the work into a small number of multithreaded jobs, but for a project with many small FastQ inputs, I might get higher throughput with --threads 1 because the pool has many more single-thread node available.

How should I build jellyfish to support this?

Do I need to build it on a node with multiple cores? with the maximum number of cores that I will use? Or can I build it on a node with a single core?
What flags do I use for configure and make? Do I need to use
make -j where N is the maximum number of cores that I will use?
Is there a way to check whether a given binary supports multithreading?

Thanks, Steve Goldstein University of Wisconsin-Madison

gmarcais commented 9 years ago

HI Steve,

I think there is some confusion between the compilation and the running of jellyfish. Let me answer your questions:

Regardless on which machine you built Jellyfish, it will support multi-thread operation.
The -j flag of make only concerns make. Regardless of the -j switch, the same executable will be created.
Every Jellyfish program support multithreading.

The number of threads used by Jellyfish is set at run time with the -t (or --threads for the long version) switch. So jellyfish count -t 1 ... will use 1 thread, jellyfish count -t 16 will use 16.

It is up to the script you submit to Condor to determine the number of threads to use and pass the proper switch to jellyfish.

Hope this clarifies things.

SteveGoldstein commented 9 years ago

I now know how to determine if a process is running on multiple threads (by looking in /proc/PROCID/tasks) and confirmed jellyfish --threads works for the binary I'm running.

Previously, I looked at the output of ps and top and didn't see evidence of multiple threads --- hence my question.

Thanks for clearing this up.

gmarcais commented 9 years ago

You can ask ps or top to display information for each thread, instead of per process information. See the man page. Also, if you look in top, if a process uses more than 100% of CPU, that means it uses more than 1 core.

SteveGoldstein commented 9 years ago

Hi,

Running jellyfish dump -t N or --threads N returns an error:

    dump: unrecognized option '--threads'
    Use --usage or --help for some help

I get a similar error for jellyfish merge.

Steve

On 11/09/2015 07:25 PM, gmarcais wrote:

Every Jellyfish program support multithreading.