amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Error while creating index #100

Closed riomario1 closed 3 years ago

riomario1 commented 7 years ago

Hi,

My Snap aligner is not working with the genome I am working with. I have 3 genome files and SNAP works with 2 of them but not third.

SNAP version 1.0beta.18.

Run: ./snap-aligner index /home/user/FILE_GENOME index-dir -locationSize 5

Error: FASTA file contained a character that's not a valid base (or N): 'Y', full line 'CAAAACTGGCCAGAATAGACCAAAACTGCGAATTTTGACGAGTTCGGGTAACAGGACCCCGGGTTTCCCAAAACGTTCGGAGGCAAGCGGGAACCAAAATCAGTGAGTAATAGCTAACTAAGGGGCCAGAATAGGCCAAAACTGCGTATTTTGACGAGTTCCACAAGGGCTAAAACTGCGATTTTTGATGAGTTCCCGGTAAGCGGACCTCGGAGTTCTCGAAACGTTCGGATCACAACGGGACCCTAAATCAGTGACTAATAGCAGAGATAACTAGCGGGAATAGGCCAAAATTGCGAGTTTTCATGAGTTCCGTGGAATCGGTTCCCGGGGTTCCCAA..... converting to 'N'. This may happen again, but there will be no more warnings. ... ... Indexing 1100000000 / 4833795607 Indexing 1200000000 / 4833795607 Killed "

Any help?

bolosky commented 7 years ago

Most likely you’re running out of memory (or memory quota). Given how many bases are in your FASTA, you’ll probably need more than 48GB.

You can try the -sm switch to the index build, which may or may not be enough to get you over the top (it’ll write some stuff generated during the middle of index build and write it out to disk to save memory). If it isn’t enough, you’ll need to increase your quota or run on a machine with more memory.

How much memory does your machine have?

--Bill

From: riomario1 [mailto:notifications@github.com] Sent: Wednesday, July 19, 2017 7:13 AM To: amplab/snap snap@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amplab/snap] Error while creating index (#100)

Hi,

My Snap aligner is not working with the genome I am working with. I have 3 genome files and SNAP works with 2 of them but not third.

SNAP version 1.0beta.18.

Run: ./snap-aligner index /home/user/FILE_GENOME index-dir -locationSize 5

Error: FASTA file contained a character that's not a valid base (or N): 'Y', full line 'CAAAACTGGCCAGAATAGACCAAAACTGCGAATTTTGACGAGTTCGGGTAACAGGACCCCGGGTTTCCCAAAACGTTCGGAGGCAAGCGGGAACCAAAATCAGTGAGTAATAGCTAACTAAGGGGCCAGAATAGGCCAAAACTGCGTATTTTGACGAGTTCCACAAGGGCTAAAACTGCGATTTTTGATGAGTTCCCGGTAAGCGGACCTCGGAGTTCTCGAAACGTTCGGATCACAACGGGACCCTAAATCAGTGACTAATAGCAGAGATAACTAGCGGGAATAGGCCAAAATTGCGAGTTTTCATGAGTTCCGTGGAATCGGTTCCCGGGGTTCCCAA..... converting to 'N'. This may happen again, but there will be no more warnings. ... ... Indexing 1100000000 / 4833795607 Indexing 1200000000 / 4833795607 Killed "

Any help?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F100&data=02%7C01%7Cbolosky%40microsoft.com%7Cda533d09263a4020bf7a08d4ceb04702%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636360703879318290&sdata=eAOds6UdnNTzxgHEP1ooaJ4i3Ii4Ac8Pab02%2BsJsApk%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752YtZDOUZn5u6QYlxd6QDM4Q52fDOks5sPg4tgaJpZM4OcyVi&data=02%7C01%7Cbolosky%40microsoft.com%7Cda533d09263a4020bf7a08d4ceb04702%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636360703879328302&sdata=b0YMmVKLdh23A9%2Fjali5bwYxCNl0YO9FFir8unIpXAw%3D&reserved=0.

riomario1 commented 7 years ago

@bolosky,

Does SNAP work with data in Hadoop?

I have total memory of 31G on my local server.

When I try to create an index file using below commands with -sm flag, it creates a bunch of HalfBuiltHashTables.x files.

~/snap/snap-aligner index FILE_GENOME  index-dir -locationSize 5 -sm

~/snap/snap-aligner single index-dir file.fastq -o file.sam

Error: Loading index from directory... Unable to open file 'index-dir/GenomeIndex' for read. Index load failed, aborting.

Any suggestion?

bolosky commented 7 years ago

There’s code in SNAP to work with files in Hadoop, although I haven’t tried it out in a long time.

However, that won’t help you. Your issue is that 31GB just isn’t enough memory to hold a SNAP index of a 4.8G base FASTA. While SNAP is fast, it uses a hash table for an index which winds up using something like 10 bytes/base of FASTA (more or less; the exact amount depends on the content of the reference). I’d try to find a machine with 64GB (or more) to run against this reference. 48GB probably isn’t enough when you allow room for the operating system, IO buffers, working memory, etc.

--Bill

From: riomario1 [mailto:notifications@github.com] Sent: Wednesday, July 19, 2017 9:44 AM To: amplab/snap snap@noreply.github.com Cc: Bill Bolosky bolosky@microsoft.com; Mention mention@noreply.github.com Subject: Re: [amplab/snap] Error while creating index (#100)

@boloskyhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbolosky&data=02%7C01%7Cbolosky%40microsoft.com%7C95c01542ed7e41ffe34108d4cec55f13%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636360794477278163&sdata=n6eleNMa0xtYSAhvzxeKGSp8nVFigjlHNYpms4PQRFk%3D&reserved=0,

Does SNAP work with data in Hadoop?

I have total memory of 31G on my local server.

When I try to create an index file using below commands with -sm flag, it creates a bunch of HalfBuiltHashTables.x files.

~/snap/snap-aligner index FILE_GENOME index-dir -locationSize 5 -sm

~/snap/snap-aligner single index-dir file.fastq -o file.sam

Error: Loading index from directory... Unable to open file 'index-dir/GenomeIndex' for read. Index load failed, aborting.

Any suggestion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F100%23issuecomment-316446371&data=02%7C01%7Cbolosky%40microsoft.com%7C95c01542ed7e41ffe34108d4cec55f13%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636360794477278163&sdata=T%2Bm49%2ByUcXs2bPz4B5hjTCLnckhLwk5IWbz%2BeG4h79o%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752b-hyPqhhSxfNFryBufZ01WCZXoEks5sPjJRgaJpZM4OcyVi&data=02%7C01%7Cbolosky%40microsoft.com%7C95c01542ed7e41ffe34108d4cec55f13%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636360794477278163&sdata=MqgzTX9OaQPhzv%2BjXGQGc3lW%2Fq77B2ixjiZaX1ZR3Wc%3D&reserved=0.