Closed nyi1991 closed 5 years ago
Hi NY,
I'm, afraid you would need to give me some more details to work with. Are you running this on a local machine, or on a cluster? What are the hardware specifications and operating system? If the program did not fail until this point, is there a chance that the process got terminated by the cluster, e.g. due to memory restrictions? Ideally, could you attach, or send to me via email the entire log file of the run, or what you see on screen?
Thanks, Felix
Hi Felix,
Sorry for the lack of details. I'm running on a cluster with CentOS installed. RAM: 256GB, CPU: 16. I am launching with the following job parameters #SBATCH --nodes=1
My bismark command is
bismark -u 1000 --bowtie2 --path_to_bowtie /sw/bowtie2/bowtie2-2.3.4.1-linux-x86_64 --genome_folder home/ny/genome/Homo_sapiens.GRCh38.dna_sm.primary_assembly test_data.fastq -o bismark_simple_test
Thanks!
Thanks for the details. From the log I can't see anything that looks odd from a Bismark perspective, so I my guess is that you are running out of memory, and the entire process is getting killed by the cluster. Maybe the primary assembly uses more memory than just the chromosome based assembly?
can you try to increase the memory to say #SBATCH --mem-per-cpu=25000
and then later on check how much memory was actually used?
I tried increasing the memory to #SBATCH --mem-per-cpu=25000
, using another reference (UCSC/hg19/Sequence/WholeGenomeFasta/), and used the test_data.fastq but the job still terminated at >>> Writing bisulfite mapping results to test_data_bismark_bt2.bam <<<
I tried again with the --parallel 12 option
bismark -u 1000 --parallel 12 --bowtie2 --genome_folder /genomes/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta /scratch/ny/Bismark/test_data.fastq
and I got farther along in the process but still couldn't get the job to finish:
Reading in the sequence file test_data.fastq.temp.12
I don't know if it helps, but I installed bismark via conda and the version is v0.20.0.
This is really quite odd... I am still not sure why this doesn't proceed, but I am fairly convinced it has nothing to do with Bismark itself, but with your cluster. By the way, I would try to get it to run on its own first, before embarking on a --parallel 12
mission. Assuming the process would take 25GB of RAM, then --parallel 12
would need 300GB of RAM, which you don't physically have...
Maybe can you try to get it to in a normal shell first, or out of curiosity increase the memory to something like 50GB, maybe your cluster counts the requirements in a different way?
It's been over a month since the last correspondence, closing this issue.
Hi Felix,
I have the same problem in writing bisulfite mapping results to .bam. It seems everything is OK, however, the bismark_bt2_pe.bam file and the bismark_bt2_PE_report.txt file are both empty.
My command is bismark ~/RefSeq/GRCh38 --path_to_bowtie $BT2_HOME --bowtie2 -1 ~/rawdata/ERR2676761_1.fastq.gz -2 ~/rawdata/ERR2676765_2.fastq.gz -o ~/result
I'm running this on a local machine, and the bismark, bowtie2, and samtools software are downloaded from anaconda. I am not sure what information I need to provide further, please let me know.
Entire log file is attached below test.out.txt
Thanks for your attention!
Regards, Yue
Hi Yue,
Until the end of your log file everything seems to work fine as expected. The only thing I can imagine that is happening is that the process gets killed by the OS, so that it doesn't even get the time to produce any further output...
How much RAM do you have available on your system? Could you open a second shell and run the top
command to see how much memory is actually used?
Thanks for for your prompt reply. I run the top command and find that I have 230G (free 104G + buff/cache 131G) available on the OS, and this bismark session used about 26G. I have no idea of how much memory normally this command use, it seems to be right?
The human genome with all the extra-chromosomal sequences you used is indeed quite big, but 26G should be well within what you have available....
Two things that could be worth checking:
To be honest I can't see any indication that it would put this in the Bismark realm (maybe apart from that you don't get a Bismark output of course...)
Hi Felix,
I will try to solve it as you said. This is my first time using bowtie2, so maybe some issues need to be resolved beforehand.
Thank you for your patience!
Best, Yue
I hope you can get it to work. I am currently downloading the FastQ files you mentioned over here to see if it is the files' fault, but the downloads from the ENA seem to be dog slow right...
Hi Felix,
I really appreciate your help.
I have updated Bowtie2 to v2.3.5 and put the latest version in my PATH
. Then I re-run the indexing. The new index looks exactly the same as previous one (ending in .bt2l). The same error has appeared again, and last line of log is
>>> Writing bisulfite mapping results to ***_bismark_bt2.bam <<<
I also tried downloading single-end FastQ file(DRR147452.sra) and running these commands, but Bismark stopped immediately at this line, too. So the FastQ files may not be the cause.
The human genome I used was GGRCh38.dna_sm.toplevel.fa.gz
, which downloaded from Ensembl. Is it possible to cause the problem?
Best, Yue
Hi Yue,
I have just tried to align the two files ERR2676765_1.fastq.gz
and ERR2676765_2.fastq.gz
against the GRCh38 version, and it starts and continues fine. Have you ever checked if the program dies, or is does it just run and run, and is just very slow?
I noticed from your post that you seem to have used the toplevel file, which was also soft-masked (_sm
). Maybe this indeed causes a problem? We only use the chromosomal sequences over here, and this worked just fine.
By a quick Google search I just found this statement:
"For primary vs. toplevel, very few aligners can properly handle additional haplotypes. If you happen to be using BWA, then the toplevel assembly would benefit you. For STAR/hisat2/bowtie2/BBmap/etc. the haplotypes will just cause you problems due to increasing multimapper rates incorrectly. Note that none of these actually use soft-masking."
So maybe getting the _dna
sequences, an d try that again?
Hi Felix,
Thanks for your suggestions!
I solved this problem. Firstly, I added an option -sam
to bismark
command then it works fine unexpectedly and generates the SAM file. Then I run a test to see if the samtools
cause the error. It turned out that the samtools (v 1.9) fails to run due to a mismatch in library versions (error while loading libcrypto.so.1.0.0). I fixed it according to these solutions: https://github.com/bioconda/bioconda-recipes/issues/12100
Besides, I also found that once the problem has been resolved, the index from _dna
sequences and _dna_sm.toplevel
sequences both work successfully.
This issue indeed not related to Bismark, but I hope this could help anyone who will have the same question in the future.
Thanks again, you've been very kind!
Best regards, Yue
Thanks very much for the update, this might indeed be useful for others!
Hello,
I've been trying to launch a test bismark run on 10000 reads with the following code:
bismark -u 10000 --bowtie2 --genome_folder /Homo_sapiens.GRCh38.dna_sm.primary_assembly/ -1 bsseq_S1_R1_001_val_1.fq.gz -2 bsseq_S1_R2_001_val_2.fq.gz
My job keeps failing but there is no error reported in the log. The last line is always:
Can someone help me to troubleshoot? Thanks!