Closed m3lorra closed 6 years ago
Hmm, from the on screen text I can't exactly see what the reason might be. Do you have enough RAM available on the machine you are using? Just using the assembled Hunan chromosomes you will need code to 10GB of memory, but I think the top-level data files contain a LOT more sequences (I think it might even need on excess of 20GB...).
Could you try to wget
only the chromosomal (and possibly non-chromosomal) sequences and try with those? Also, you could try to index with Bowtie2 instead (the default) and see if that works. Is there any reason why you want to use Bowtie 1 instead? These days it is very much recommended to use Bowtie2 because of its ability to map gapped alignments and because it scares with longer read lengths. Cheers, Felix
Hi Felix, Thanks a lot, I'll try what you suggest and I'll let you know. I'm using bowtie1 just because I'm doing RRBS analysis and every protocol that I have found use it. I'll try with bowtie2.
Thank you!
Hi, What is the impact of choosing a different reference genome? Which one should I choose? Ensembl:
The genome README file explains the different options:
<sequence type>:
* 'dna' - unmasked genomic DNA sequences.
* 'dna_rm' - masked genomic DNA. Interspersed repeats and low
complexity regions are detected with the RepeatMasker tool and masked
by replacing repeats with 'N's.
* 'dna_sm' - soft-masked genomic DNA. All repeats and low complexity regions
have been replaced with lowercased versions of their nucleic base
You should not use the hard-masked genome version (dna_rm
) since this will convert half the genome into Ns. Soft-masking should not affect the process because as far as I know all sequence is converted to upper case for the genome preparation etc. anyway.
Thanks very much!!!!! I'll try Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.
Dear Felix Krueger, I'm sorry to bother you again! After running bismark, there are some “extra chromosomes” in the report(Below). Should I remove these sequences from genome reference ? And, I don't know how to remove it either. I'm sorry. I don't want to take up too much of your time. Thanks.
FastQ format specified chr 1 (248956422 bp) chr 10 (133797422 bp) chr 11 (135086622 bp) chr 12 (133275309 bp) chr 13 (114364328 bp) chr 14 (107043718 bp) chr 15 (101991189 bp) chr 16 (90338345 bp) chr 17 (83257441 bp) chr 18 (80373285 bp) chr 19 (58617616 bp) chr 2 (242193529 bp) chr 20 (64444167 bp) chr 21 (46709983 bp) chr 22 (50818468 bp) chr 3 (198295559 bp) chr 4 (190214555 bp) chr 5 (181538259 bp) chr 6 (170805979 bp) chr 7 (159345973 bp) chr 8 (145138636 bp) chr 9 (138394717 bp) chr MT (16569 bp) chr X (156040895 bp) chr Y (57227415 bp) chr KI270728.1 (1872759 bp) chr KI270727.1 (448248 bp) chr KI270442.1 (392061 bp) chr KI270729.1 (280839 bp) chr GL000225.1 (211173 bp) chr KI270743.1 (210658 bp) chr GL000008.2 (209709 bp) chr GL000009.2 (201709 bp) chr KI270747.1 (198735 bp) chr KI270722.1 (194050 bp) chr GL000194.1 (191469 bp) chr KI270742.1 (186739 bp) chr GL000205.2 (185591 bp) chr GL000195.1 (182896 bp) chr KI270736.1 (181920 bp) chr KI270733.1 (179772 bp) chr GL000224.1 (179693 bp) chr GL000219.1 (179198 bp) chr KI270719.1 (176845 bp) chr GL000216.2 (176608 bp) chr KI270712.1 (176043 bp) chr KI270706.1 (175055 bp) chr KI270725.1 (172810 bp) chr KI270744.1 (168472 bp) chr KI270734.1 (165050 bp) chr GL000213.1 (164239 bp) chr GL000220.1 (161802 bp) chr KI270715.1 (161471 bp) chr GL000218.1 (161147 bp) chr KI270749.1 (158759 bp) chr KI270741.1 (157432 bp) chr GL000221.1 (155397 bp) chr KI270716.1 (153799 bp) chr KI270731.1 (150754 bp) chr KI270751.1 (150742 bp) chr KI270750.1 (148850 bp) chr KI270519.1 (138126 bp) chr GL000214.1 (137718 bp) chr KI270708.1 (127682 bp) chr KI270730.1 (112551 bp) chr KI270438.1 (112505 bp) chr KI270737.1 (103838 bp) chr KI270721.1 (100316 bp) chr KI270738.1 (99375 bp) chr KI270748.1 (93321 bp) chr KI270435.1 (92983 bp) chr GL000208.1 (92689 bp) chr KI270538.1 (91309 bp) chr KI270756.1 (79590 bp) chr KI270739.1 (73985 bp) chr KI270757.1 (71251 bp) chr KI270709.1 (66860 bp) chr KI270746.1 (66486 bp) chr KI270753.1 (62944 bp) chr KI270589.1 (44474 bp) chr KI270726.1 (43739 bp)
These sequences are contigs that could not be placed on individual chromosomes. I wouldn't necessarily remove them, but maybe you can simply focus on analysing the data from chromsosomes 1-22, X,Y and MT? It is good to include these sequences for the mapping process, but when we import data into Seqmonk afterwards it simply doesn't import data from unplaced contigs at all.
Thank you for your patience and help!!!!
Hi Felix, Thanks a lot, I'll try what you suggest and I'll let you know. I'm using bowtie1 just because I'm doing RRBS analysis and every protocol that I have found use it. I'll try with bowtie2.
Thank you!
Hello m3lorra,
I'm experiencing the same problem as you. Would you mind sharing how did you solve that please? Thank you for your kind reply!
Hi Felix, Thanks a lot, I'll try what you suggest and I'll let you know. I'm using bowtie1 just because I'm doing RRBS analysis and every protocol that I have found use it. I'll try with bowtie2. Thank you!
Hello m3lorra,
I'm experiencing the same problem as you. Would you mind sharing how did you solve that please? Thank you for your kind reply!
Hello Lesky47,
I am having the same problem. Did you figure out what caused it?
Regards, xiaofei
Hi, I'm using bismark_genome_preparation to in-silico bisulfite convert the reference genome.
but I'm getting this message:
Parent process: Failed to build index
Any clue of what might be wrong? Bismark v0.20.0
Thank you