FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

Cheers Felix! I think the problem was that we had two genomes in our genome folder within our server. #352

Closed xiaoxiaoh16 closed 4 years ago

xiaoxiaoh16 commented 4 years ago

Cheers Felix! I think the problem was that we had two genomes in our genome folder within our server. Bismark completed genome preparation and bisulfite conversion. I now however have a new problem when Bismark is indexing of the CT converted genome, I wonder if you can help: Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer Please be aware that this process can - depending on genome size - take several hours!

Preparing indexing of CT converted genome in /data/Jason/FASTQ_and_processed_data/genomes/Bisulfite_Genome/CT_conversion/ Parent process: Starting to index C->T converted genome with the following command:

bowtie2-build -f genome_mfa.CT_conversion.fa BS_CT

Settings: Output files: "BS_CT..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: genome_mfa.CT_conversion.fa Building a SMALL index Reading reference sizes Preparing indexing of GA converted genome in /data/Jason/FASTQ_and_processed_data/genomes/Bisulfite_Genome/GA_conversion/ Child process: Starting to index G->A converted genome with the following command:

bowtie2-build -f genome_mfa.GA_conversion.fa BS_GA

Settings: Output files: "BS_GA..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: genome_mfa.GA_conversion.fa Building a SMALL index Reading reference sizes Time reading reference sizes: 00:00:46 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time reading reference sizes: 00:00:46 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:29 bmax according to bmaxDivN setting: 724327615 Using parameters --bmax 543245712 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 543245712 --dcv 1024 Constructing suffix-array element generator Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples Time to join reference sequences: 00:00:33 bmax according to bmaxDivN setting: 724327615 Using parameters --bmax 543245712 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 543245712 --dcv 1024 Constructing suffix-array element generator Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:01:49 Allocating rank array Ranking v-sort output V-Sorting samples time: 00:01:53 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:23 Invoking Larsson-Sadakane on ranks Ranking v-sort output time: 00:00:22 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:34 Sanity-checking and returning Building samples Reserving space for 12 sample suffixes Generating random suffixes QSorting 12 sample offsets, eliminating duplicates QSorting sample offsets, eliminating duplicates time: 00:00:00 Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00 Calculating bucket sizes Splitting and merging Splitting and merging time: 00:00:00 Avg bucket size: 2.89731e+09 (target: 543245711) Converting suffix-array elements to index image Allocating ftab, absorbFtab Entering Ebwt loop Getting block 1 of 1 No samples; assembling all-inclusive block Invoking Larsson-Sadakane on ranks time: 00:00:33 Sanity-checking and returning Building samples Reserving space for 12 sample suffixes Generating random suffixes QSorting 12 sample offsets, eliminating duplicates QSorting sample offsets, eliminating duplicates time: 00:00:00 Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00 Calculating bucket sizes Splitting and merging Splitting and merging time: 00:00:00 Avg bucket size: 2.89731e+09 (target: 543245711) Converting suffix-array elements to index image Allocating ftab, absorbFtab Entering Ebwt loop Getting block 1 of 1 No samples; assembling all-inclusive block Sorting block of length 2897310462 for bucket 1 (Using difference cover) Sorting block of length 2897310462 for bucket 1 (Using difference cover) Parent process: Failed to build index

Originally posted by @jasonsaunderswilliams in https://github.com/FelixKrueger/Bismark/issues/169#issuecomment-493070327

FelixKrueger commented 4 years ago

Hmm, not really sure what it happening in your case. What was the command, and which versions of Bismark and Bowtie2 were you using? are you pointing the process to folder with a a single multi-FastA file, or are there multiple .fa files like one for each chromosome?

Here are a few comments from the previous thread:

Is there a chance that your computing environment prevents you from launching sub-processes at all? How much memory do you have available, and which genome would you like to index?

In theory you should also be able to move to the CT and GA conversion sub-folders, and then run the commands in there yourself:

bowtie2-build -f genome_mfa.GA_conversion.fa BS_CT bowtie2-build -f genome_mfa.GA_conversion.fa BS_GA

Would this work?