Closed ezecalvo closed 2 years ago
Hello,
I believe the large memory problem may caused by the graph index building process. Maybe the SNP or SS database is too large for hisat-3n. Could you decrease the number of SNP (try common SNP only) and try again?
Thanks, Leo
Hi,
I'm not using any SNP file, just masking annotated SNPs. Is that what you were referring to with "common"?
Thanks
Could you show me the script you used for index building? Also, could you give me the database(file) you used for index building? Then we can try it on our side.
Thanks, Leo
Also, can I have the reference (fast) file? 40gb memory usage for chr1 is also too big for me.
Sure: https://www.dropbox.com/sh/jaerase0es7ygr8/AADJDylN-AWB5nwvg6qZSgXDa?dl=0
My code: hisat-3n-build --base-change T,C --noauto --bmax 2 --dcv 512 -p 1 --ss mm10.ss --exon mm10.exon mm10_masked.fasta output/hisat3_genome
If it helps, this is the report from a job submission using -p 20, when reaching 500gb it gets killed:
Also:, I tried using a smaller genome (for example just chr1) it works just fine and uses ~40gb max memory.
Also, can I have the reference (fast) file? 40gb memory usage for chr1 is also too big for me.
Not sure what you mean with reference, but just added the full fasta file (non-masked) and a VCF with all the positions I masked. I'm not using the entire fasta for this!
Hello,
I try to build the graph index with masked genome, it also failed on my side. Because the masked genome makes the graph index very complicated, HISAT2
(HISAT-3N
) cannot handle it. However, there is an alternative method let you incorporate the splice site information with your index.
--known-splicesite-infile <path>
(please check the HISAT2 manual for more information). HISAT-3N could use the splice site information during alignment process and increase the alignment accuracy.Best, Leo
That works like a charm.
Thanks!
Hi, I'm trying to build a hisat3 masked genome but takes >500gb memory which is my node limit. This happens even when setting options like bmax or dcv to the minimum.
Is there a way to maybe break the genome into subsections and then merge them? Thought about doing that for each chromosome but not sure how to do that once I get the ht2 files.
Thanks!