Closed Nitin123-4 closed 3 months ago
Increasing the the mismatches from the default (0) to -N 1
is probably slowing things down markedly. Lowering this and/or increasing the --parallel
are your best options.
Thanks for the quick reply.
WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --parallel specified will effectively lead to a linear increase in compute and memory requirements, so --parallel 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned.
With --parallel 4 it is taking ~40GB RAM. If we increase it to --parallel 8 or 10 it will take a lot of memory.
I think this is difficult as it needs more RAM also.
Also Is it recommended to use mismatches (0) ?
This is from the Bowtie 2 manual:
-N <int>
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
To be honest, I don't think I have ever changes this to 1 ever. It only really requires one multi-seed alignment in the read somewhere as anchor, which in my experience is pretty much always the case.
Okay. Thanks for your response.
Please suggest about : With --parallel 4 it is taking ~40GB RAM. If we increase it to --parallel 8 or 10 it will take a lot of memory.
Is there any solution for this?
I am afraid there isn't really a solution for this, if you ask for more memory it will also use more... You could potentially try to give each Bowtie2 thread some additional core (e.g. -p 4
), but this will only get you so far to be honest. It should not use much more memory though. This could result in the following command line:
bismark -p 4 --parallel 4 $RESOURCES2/HG38/ -1 $Read1 -2 $Read2 --output_dir $PWD --temp_dir $PWD/$SAMPLEID"_TEMP" --prefix $SAMPLEID
Okay, so it means it will use 16 CPUs in total and ~40GB RAM?
If you open a second terminal and run top
you should be able to monitor usage statistics
@FelixKrueger Is there a way to put all the temp files into memory? so far I can only get the initial C-T and A-G to be written to /dev/shm/ our clusters have 2TB nvram and that is not being used according to top. The temp.bam files are still being written to a filesystem that is really really slow to read and write to compared to the shared memory.
I am afraid there isn't any functionality to 'dump' temporary BAM records into memory.
@FelixKrueger Thank you for the quick response. The cluster I am using has 64 cpus across two cores. Is using -p >10 going to help in anyway?
I personally don't tend to use -p
at all, but if you've got some spare why not add it it? You could always run a quick test with different values, e.g -p 2
, -p 3
, -p 4
to see if it really makes a noticable difference (maybe add time
before the command, and -u 1000000
to only use a smallish subset).
Hi Team,
I am running Bismark with the below command. I can see it's really slow.
bismark --bowtie2 -N 1 --parallel 4 $RESOURCES2/HG38/ -1 $Read1 -2 $Read2 --output_dir $PWD --temp_dir $PWD/$SAMPLEID"_TEMP" --prefix $SAMPLEID
I have 218,289,382 total reads i.e. 32.96(Gb) data. I did pre processing using Trimmomatic. Filtered reads are used for this. It took ~22 h to complete. Can you please help with this?
Bismark Version: v0.24.0