genepi / imputationserver-docker

Docker Image for Michigan Imputation Server
18 stars 13 forks source link

CLI argument to limit map/reduce tasks per job? #4

Open oskarvid opened 5 years ago

oskarvid commented 5 years ago

I'm trying to limit the resource usage per job so I can optimize the total run time for two or more jobs. How do I do that? I start the impute server like so: docker run -t -p 8080:80 -e DOCKER_CORES="16" -v $(pwd):/data/ --name imputeserver-16cores genepi/imputationserver
And I start each job like so: docker exec -t -i imputeserver-16cores cloudgene run imputationserver --files /data/input.vcf.gz --refpanel apps@hapmap2 --conf /etc/hadoop/conf

I figured I could perhaps change the settings in the files in /etc/hadoop/conf, put them locally and change --conf to point to those files in e.g /data/conf, and so far changing the values in mapred-site.xml doesn't affect the number of created map/reduce tasks. Should it work?

I also see in the terminal output that "/data/apps/imputationserver/job.config" is unavailable, perhaps that's where I can define the number of map/reduce tasks per job? I've looked around but haven't found any documentation about it so I don't know what it does.