Closed gbnci closed 6 years ago
Hi @gbnci,
I just looked and I have downloaded the Opossum genome in 2015 (build BROAD05) already, and it produced the following 7 files in both the CT and GA genome conversion folders:
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 655316 Jun 16 2015 BS_CT.3.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415075 Jun 16 2015 BS_CT.4.bt2
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.rev.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.rev.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 3665725775 Jun 16 2015 genome_mfa.CT_conversion.fa
So you definitely need 6 files ending in .bt2
. If I would have to guess I would suspect that you
a) didn't wait long enough for the indexing to complete (judging by the genome size of 3.6GB I would expect it to take between 2 and 4 hours), or
b) that you didn't give it enough memory to work with? Parallel indexing will probably take at least 9GB or more RAM.
Is either of those possible?
I just downloaded the genome you mentioned (Monodelphis_domestica.monDom5.dna.toplevel.fa
) from Ensembl (couldn't find that very file from the Broad website), and am indexing as we speak. I'll update tomorrow if it didn't complete successfully for any reason. So far it seems to have started fine:
Bismark Genome Preparation - Step II: Bisulfite converting reference genome
conversions performed:
chromosome C->T G->A
1 138636207 138674540
2 100127924 100132406
3 95991212 96049612
4 79638925 79580201
5 55208532 55150288
6 54020638 54008284
7 46697639 46721056
8 57496204 57378070
X 14957212 14986487
MT 3776 2201
Un 19434538 19520964
Total number of conversions performed:
C->T: 662212807
G->A: 662204109
Hi, FelixKrueger: Thanks for your suggestion. I think you are right. When I generated the genome, it only took a few minutes and seems to me has finished, and two of the 5 files have zero size. I only used 8G ram for the processing. I am trying right now and will update on the website tomorrow. Thanks Y Wang
From: FelixKrueger notifications@github.com Reply-To: FelixKrueger/Bismark reply@reply.github.com Date: Tuesday, March 20, 2018 at 5:59 PM To: FelixKrueger/Bismark Bismark@noreply.github.com Cc: "Wang, Yonghong (NIH/NCI) [E]" wangyong@mail.nih.gov, Mention mention@noreply.github.com Subject: Re: [FelixKrueger/Bismark] Fail to make genome ref for bismark alignment (#164)
Hi @gbncihttps://github.com/gbnci,
I just looked and I have downloaded the Opossum genome in 2015 (build BROAD05) already, and it produced the following 7 files in both the CT and GA genome conversion folders:
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 655316 Jun 16 2015 BS_CT.3.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415075 Jun 16 2015 BS_CT.4.bt2
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.rev.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.rev.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 3665725775 Jun 16 2015 genome_mfa.CT_conversion.fa
So you definitely need 6 files ending in .bt2. If I would have to guess I would suspect that you a) didn't wait long enough for the indexing to complete (judging by the genome size of 3.6GB I would expect it to take between 2 and 4 hours), or b) that you didn't give it enough memory to work with? Parallel indexing will probably take at least 9GB or more RAM.
Is either of those possible?
I just downloaded the genome you mentioned (Monodelphis_domestica.monDom5.dna.toplevel.fa) from Ensembl (couldn't find that very file from the Broad website), and am indexing as we speak. I'll update tomorrow if it didn't complete successfully for any reason. So far it seems to have started fine:
Bismark Genome Preparation - Step II: Bisulfite converting reference genome
conversions performed:
chromosome C->T G->A
1 138636207 138674540
2 100127924 100132406
3 95991212 96049612
4 79638925 79580201
5 55208532 55150288
6 54020638 54008284
7 46697639 46721056
8 57496204 57378070
X 14957212 14986487
MT 3776 2201
Un 19434538 19520964
Total number of conversions performed:
C->T: 662212807
G->A: 662204109
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/164#issuecomment-374771453, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aj2o-H7VabbnAMpSEQDQVoDbyFCrJC_Fks5tgXuHgaJpZM4SyhBv.
It finished in a minute again even I used 32G RAM with the same output I got before. Here is the output: Writing bisulfite genomes out into a single MFA (multi FastA) file
Bisulfite Genome Indexer version v0.19.0 (last modified 07 November 2016)
Step I - Prepare genome folders - completed
Total number of conversions performed: C->T: 662212807 G->A: 662204109
Step II - Genome bisulfite conversions - completed
Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer Please be aware that this process can - depending on genome size - take several hours! Settings: Output files: "BS_CT..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: genome_mfa.CT_conversion.fa Building a SMALL index Reading reference sizes Settings: Output files: "BS_GA..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: genome_mfa.GA_conversion.fa Building a SMALL index Reading reference sizes Time reading reference sizes: 00:00:59 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time reading reference sizes: 00:00:59 Calculating joined length Writing header Reserving space for joined string Joining reference sequences
I am guessing the genome fasta file I downloaded may cause the problem. Will try to download file from Ensembl as you suggested to see whether it will work or not. Thanks
From: FelixKrueger notifications@github.com Reply-To: FelixKrueger/Bismark reply@reply.github.com Date: Tuesday, March 20, 2018 at 5:59 PM To: FelixKrueger/Bismark Bismark@noreply.github.com Cc: "Wang, Yonghong (NIH/NCI) [E]" wangyong@mail.nih.gov, Mention mention@noreply.github.com Subject: Re: [FelixKrueger/Bismark] Fail to make genome ref for bismark alignment (#164)
Hi @gbncihttps://github.com/gbnci,
I just looked and I have downloaded the Opossum genome in 2015 (build BROAD05) already, and it produced the following 7 files in both the CT and GA genome conversion folders:
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 655316 Jun 16 2015 BS_CT.3.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415075 Jun 16 2015 BS_CT.4.bt2
-rw-rw-r-- 1 fkrueger bioinf 1172288552 Jun 16 2015 BS_CT.rev.1.bt2
-rw-rw-r-- 1 fkrueger bioinf 875415080 Jun 16 2015 BS_CT.rev.2.bt2
-rw-rw-r-- 1 fkrueger bioinf 3665725775 Jun 16 2015 genome_mfa.CT_conversion.fa
So you definitely need 6 files ending in .bt2. If I would have to guess I would suspect that you a) didn't wait long enough for the indexing to complete (judging by the genome size of 3.6GB I would expect it to take between 2 and 4 hours), or b) that you didn't give it enough memory to work with? Parallel indexing will probably take at least 9GB or more RAM.
Is either of those possible?
I just downloaded the genome you mentioned (Monodelphis_domestica.monDom5.dna.toplevel.fa) from Ensembl (couldn't find that very file from the Broad website), and am indexing as we speak. I'll update tomorrow if it didn't complete successfully for any reason. So far it seems to have started fine:
Bismark Genome Preparation - Step II: Bisulfite converting reference genome
conversions performed:
chromosome C->T G->A
1 138636207 138674540
2 100127924 100132406
3 95991212 96049612
4 79638925 79580201
5 55208532 55150288
6 54020638 54008284
7 46697639 46721056
8 57496204 57378070
X 14957212 14986487
MT 3776 2201
Un 19434538 19520964
Total number of conversions performed:
C->T: 662212807
G->A: 662204109
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/164#issuecomment-374771453, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aj2o-H7VabbnAMpSEQDQVoDbyFCrJC_Fks5tgXuHgaJpZM4SyhBv.
It seems that your process still hadn't finished yesterday. Over here it took ~13GB of memory and close to 6 hours for the indexing:
System Time = 01:06:54
Wallclock Time = 05:47:21
CPU = 10:36:16
Max vmem = 12.963G
Exit Status = 0
The files are identical (to the byte) to the older build from the Broad:
-rw-r--r-- 1 fkrueger bioinf 1172288552 Mar 21 01:07 BS_CT.1.bt2
-rw-r--r-- 1 fkrueger bioinf 875415080 Mar 21 01:07 BS_CT.2.bt2
-rw-r--r-- 1 fkrueger bioinf 655316 Mar 20 21:54 BS_CT.3.bt2
-rw-r--r-- 1 fkrueger bioinf 875415075 Mar 20 21:54 BS_CT.4.bt2
-rw-r--r-- 1 fkrueger bioinf 1172288552 Mar 21 03:35 BS_CT.rev.1.bt2
-rw-r--r-- 1 fkrueger bioinf 875415080 Mar 21 03:35 BS_CT.rev.2.bt2
-rw-r--r-- 1 fkrueger bioinf 3665725775 Mar 20 21:53 genome_mfa.CT_conversion.fa
So I am hoping that this morning everything should just work for you.
Cheers, Felix
Hi, FelixKrueger Thanks for your troubleshooting. I think there must be some kinds of setting issues on my side that prevent the run from finishing as it always stops in about a minute. While I am still doing my troubleshooting here, I am wondering whether I can get the files you just generated to facilitate my analysis here. If OK, I can send you a link for you to upload the files to me. Thanks again for the help。 Best regard Y wang
From: FelixKrueger notifications@github.com Reply-To: FelixKrueger/Bismark reply@reply.github.com Date: Wednesday, March 21, 2018 at 5:54 AM To: FelixKrueger/Bismark Bismark@noreply.github.com Cc: "Wang, Yonghong (NIH/NCI) [E]" wangyong@mail.nih.gov, Mention mention@noreply.github.com Subject: Re: [FelixKrueger/Bismark] Fail to make genome ref for bismark alignment (#164)
It seems that your process still hadn't finished yesterday. Over here it took ~13GB of memory and close to 6 hours for the indexing:
System Time = 01:06:54
Wallclock Time = 05:47:21
CPU = 10:36:16
Max vmem = 12.963G
Exit Status = 0
The files are identical (to the byte) to the older build from the Broad:
-rw-r--r-- 1 fkrueger bioinf 1172288552 Mar 21 01:07 BS_CT.1.bt2
-rw-r--r-- 1 fkrueger bioinf 875415080 Mar 21 01:07 BS_CT.2.bt2
-rw-r--r-- 1 fkrueger bioinf 655316 Mar 20 21:54 BS_CT.3.bt2
-rw-r--r-- 1 fkrueger bioinf 875415075 Mar 20 21:54 BS_CT.4.bt2
-rw-r--r-- 1 fkrueger bioinf 1172288552 Mar 21 03:35 BS_CT.rev.1.bt2
-rw-r--r-- 1 fkrueger bioinf 875415080 Mar 21 03:35 BS_CT.rev.2.bt2
-rw-r--r-- 1 fkrueger bioinf 3665725775 Mar 20 21:53 genome_mfa.CT_conversion.fa
So I am hoping that this morning everything should just work for you.
Cheers, Felix
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/164#issuecomment-374883662, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aj2o-BI2uHfV6eF6Fde14beozUaKCYdtks5tgiNEgaJpZM4SyhBv.
This is certainly possible, here are the details for the files (active for 3 days):
Hostname ftp2.babraham.ac.uk Username ftpusr41 Password p7n8GbKA FTP URL ftp://ftpusr41:p7n8GbKA@ftp2.babraham.ac.uk
Cheers, Felix
Thanks. Will do it as soon as possible.
From: FelixKrueger notifications@github.com Reply-To: FelixKrueger/Bismark reply@reply.github.com Date: Wednesday, March 21, 2018 at 12:29 PM To: FelixKrueger/Bismark Bismark@noreply.github.com Cc: "Wang, Yonghong (NIH/NCI) [E]" wangyong@mail.nih.gov, Mention mention@noreply.github.com Subject: Re: [FelixKrueger/Bismark] Fail to make genome ref for bismark alignment (#164)
This is certainly possible, here are the details for the files (active for 3 days):
Connection Details
Hostname ftp2.babraham.ac.uk Username ftpusr41 Password p7n8GbKA FTP URL ftp://ftpusr41:p7n8GbKA@ftp2.babraham.ac.uk
Cheers, Felix
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/164#issuecomment-375006388, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aj2o-KbvfkmV1c-JeL2EkVbwWHuTT_hwks5tgn_tgaJpZM4SyhBv.
It turned out that the culprit was an additional .fa file in the folder that lead to the apparent duplication of file names. All seems to be working now.
I have downloaded Possum genome sequence from Broad ("Monodelphis_domestica.monDom5.dna.toplevel.fa") and ran genome preparation command as below: "bismark_genome_preparation --bowtie2 . " with "fa" file in the current directory. From the resulting folder "Bisulfite_Genome", two directories were created ("CT_conversion" and "GA_conversion"), but under each one only five files were generated ("BS_CT.1.bt2 BS_CT.2.bt2 BS_CT.3.bt2 BS_CT.4.bt2 genome_mfa.CT_conversion.fa" for CT_conversion and "BS_GA.1.bt2 BS_GA.2.bt2 BS_GA.3.bt2 BS_GA.4.bt2 genome_mfa.GA_conversion.fa" for GA_conversion. While running bismark alignment, I got the following error: Alignments will be written out in BAM format. Samtools found here: '/usr/local/apps/samtools/1.6/bin/samtools' Reference genome folder provided is /scratch/wangyong/possum/ (absolute path is '/spin1/scratch/wangyong/possum/)' The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.1.bt2'). Please run the bismark_genome_preparat ion before running Bismark The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.2.bt2'). Please run the bismark_genome_preparat ion before running Bismark The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.rev.1.bt2'). Please run bismark_genome_preparation before running Bismark The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.rev.2.bt2'). Please run bismark_genome_preparation before running Bismark
Couldn't find a traditional small Bowtie 2 index for the genome specified (ending in .bt2). Now searching for a large index instead...... Seems to me I failed to create the genome necessary for the alignment, could you please give me any suggestions about this. I am using bismark 0.19.0 and bowtie 2-2.3.4 Thanks