Illumina / Isaac3

Aligner for sequencing data
Other
18 stars 2 forks source link

+ sign not found where expected #9

Open wiedenhoeft opened 7 years ago

wiedenhoeft commented 7 years ago

I'm using iSAAC-03.17.03.01. I have created simulated fastq.gz files using VarSim v0.7.8. My reference genome is hg38, and I built the index according to the instructions provided. Using the call

${isaac} --base-calls ${outdir} --base-calls-format fastq-gz --memory-limit 25 --reference-genome ${refXMLdescriptor} --reference-name ${refname} --sample-sheet none

I get the following error:

Failed to parse the options: /opt/src/iSAAC/iSAAC-03.17.03.01/src/c++/lib/io/FastqReader.cpp(242): Throw in function void isaac::io::FastqReader::findQScores() Dynamic exception type: boost::exception_detail::clone_impl<isaac::io::FastqFormatException> std::exception::what: + sign not found where expected: /data/lane1_read1.fastq.gz, offset 324

What does this error mean? I've checked the file, it looks like a perfectly normal fastq.

rpetrovski commented 7 years ago

Can you please post first 50 or so lines of your lane1_read1.fastq.gz file?

Roman.

On 11 May 2017 16:27, "wiedenhoeft" notifications@github.com wrote:

I'm using iSAAC-03.17.03.01. I have created simulated fastq.gz files using VarSim v0.7.8. My reference genome is hg38, and I built the index according to the instructions provided. Using the call

${isaac} --base-calls ${outdir} --base-calls-format fastq-gz --memory-limit 25 --reference-genome ${refXMLdescriptor} --reference-name ${refname} --sample-sheet none

I get the following error:

Failed to parse the options: /opt/src/iSAAC/iSAAC-03.17.03. 01/src/c++/lib/io/FastqReader.cpp(242): Throw in function void isaac::io::FastqReader::findQScores() Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: + sign not found where expected: /data/lane1_read1.fastq.gz, offset 324

What does this error mean? I've checked the file, it looks like a perfectly normal fastq.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Illumina/Isaac3/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8scZbaDCrqhl3Tqlc7x3smqEOLRTw-ks5r4yjFgaJpZM4NYKyw .

wiedenhoeft commented 7 years ago

Gladly, thanks for the quick response!

@1-38891657--:1-38891451-:::::::::::3123365MA==:1/1 TTTTGCAATTTCTATGAAGGGTTCGATTATCCCCATTACAGGAACAGAGGCAACTATGTCTGTGGGAAGGGACTCTTTAGTTTATTGACCATTTAAATATAAGAACGCTAATATTTCACTCTGAGATAAAATAATTCACTTGTTTATAGT + FAFFFKKKKKKKKKKKKKKKKKKKFKKKAK7KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK7KKKKKFKFKKKKKKKK<KKKKK<KKKKKKKKAKKKKKKKKKKKAFKKFKAKK<KKKKFKKKKKKKFKKK,KKFKKKKKKFKFFK @1-74462122-:1-74462317--:::::::::::3123363Mg==:1/1 TTAATTTTTTAAACTTCTACAAGGAAGAAGTATTAATTTACAATTGGCAAAATTAGGTAATCATTCAGAATAATCCATAAAGTGTTGATCAAATCACATTGCATATAATTTCACTAGGCCTACTGAAAGTGTATGGATAAACAGAATTTG + FAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKK<<KKKKKKKKKKKKKKKK7KKKKK,FKKKKAKKKKFKKKKKKFKKKKKAKKKKKKKKKFKKKKKKKKKF,AKKFKFK<KKKKKKKKKKKKK,KKF7KKK7KKKK @1-172470587--:1-172470304-:::::::::::3123361NA==:1/1 GAAATGGGATCAGAATTCAAGGGCACTGTTTTTGGAGTTCTTAAACTCCAACTGAACTGAAACCAAGAATCATAGCCTACCAGGAATTCAGGACAATCACGCAAGTTTTTCTGTTTTTGTTTGTCTTTTTTTTTTTTTCTTGTGCAGGAA + FAFFFFAKFKKKKKKKKKKFKKKKKKKKFKKKAKKKKKKKKKKAKFKKKKKKKKKKKKKKFKKK7KKKKKKFKKFKKKKKKKKKKKKKKK7K,KK7KKK(KKAKKFKKKKKKKKKK<FKKKKKKKKKKKFKKKKFA,KKKAKKKKK<KKK @1-179885532--:1-179885296-:::::::::::3123359Ng==:1/1 TAAGTTTAGAATACAATAGCTCACATTATAAAGCATAATCTTATTAGAAACAGGTAAATAAGTCAGGATCTGGGTACTAAAAACCAAGGATTATCAGTTTTAACAGCTATGATTGTTTATTATTGTGAGACTTTCCTAATATCAAAAAAA + FAFFFKKKKKKFKK<KKKKKKKKKKKKKKKKKKKKKKKKK7KKKKKKKAKKKKKKKKK7KKKKKKKKKKKKKKKKKKKKKKKFKKKK,FKK<KKKKKKKKK(KKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKKFAKKKAKKKKKKKKFF @1-85807039-:1-85807288--:::::::::::3123357OA==:1/1 AACAGACATGGGCAAGACTAAGAACTCTGAACCTCTAATCCTGCTTAACCTTCCTGGCTTGTGGAGGCAGCCCCTACTCAAGTGCCTGGGAACACCATTAATAGTCCAGTAGATTGGCAATTTGAAGCCTGGACTCGATTGTGGCCTACA + FAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKAKKKKKKKKKFKFKKKKFKKKK7KKKKKFFKKKKKKKKKKKKKKKKKKAKKKKKKKKKKKKK7KFKKKAKKKKKKKKKKKKKKFFFK,KKKKKKKKK @1-156033365-:1-156033501--:::::::::::3123355MTA=:1/1 TGATTCCAGGTTTTTTTTGTTTTGTTTTGTTTTGTTTTTTTGCTAGATGTATTCAATCCCTGCCCCACTTTATCTCTCTGAACATCCCGTTTGCTTGCTCCTTCCTTCCTTGCTTGACCCCAGGAGTTTGAGACCAGTCTGGGCAACATG + FAFFFKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKFKKKKKKFKKK(KKKKKKKKKKKKKKKKKKAKKKKKKKKKKKKKKKKKKKKKFAK<KKKKKKKKKKKKKKKKKKKKKKKFAKKFKAKKKFKKAFKKKAKFKKK,,KKKF @1-120491056-:1-120491136--:::::::::::3123353MTI=:1/1 TAGACCAATGAGACCCAACAGATCTCTTTTGTCATGTTACTTAAAGATACAGCAAAATCAAATCGCTGTTCTTAACCCGGGAATGTGCAGTGAAATCAGTTTATTAGGTTGCAAATTGCTTTTACTTTTTCATGAAGAGTAGAACACATA + FAFFFKKKKKKKKKKKKKKKKK<KKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKFKKKKKKKAFKKFKKKKKKKKKKKKKKKKKK,KKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK,KKKKKKKK,AFKKKKKKKKKKKA @1-237948814--:1-237948616-:::::::::::3123349MTQ=:1/1 CCACTTTCTCGTCAGTTGCTCATATCAGCTGAGATGTAAATTATGGCTGAATGGTCTGCTATTTTCATGCTAACGGCGCCTCAATATTTTTCATCTGCTTTTTCTTACACATTGGCTGAGGCCTATACTACGGGTCATTCCTATTTTTAG + FAAFFKKKKKFKKKKKKKKKKKKKKKKKFKKKKKKKKAKKKKKKFKFKKKK77KKKKKKKKKKKKFKKFKFKKKKKKKKKKKKKFKKKKKKKKKFKKKKKKKKKKFKKKKK<KKKKKKKKKK,FKFKKKKKKAAKKFKK,KF7FKKKKFK @1-241932501--:1-241932257-:::::::::::3123347MTY=:1/1
AGTATAATGGCACAATCTTGGCTCACTGCAAACTCTGCCTCCTGGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGAGATTACAGGCACCCGCCACGACGCCCAACTAATTTTTGTATTTTTGTAGAGACGGGGTTTCACCA

rpetrovski commented 7 years ago

Isaac does not complain about the above for me. Something must be wrong with compression or newlines. Would you be able to make the entire fastq.gz available?

If the fastq.gz is too big, I think it's ok to cut it at a few kb offset. The error clearly happens within the first record anyway.

Roman.

wiedenhoeft commented 7 years ago

This happens within the first 8 lines for me. Strangely enough, If I only keep one record (first 4 lines), I get: Failed to parse the options: /opt/src/iSAAC/iSAAC-03.17.03.01/src/c++/lib/io/FastqReader.cpp(158): Throw in function void isaac::io::FastqReader::findHeader() Dynamic exception type: boost::exception_detail::clone_impl<isaac::io::FastqFormatException> std::exception::what: Fastq file end while reading the header line:

I've attached the first 10K lines: lane1_read1.fastq.gz

rpetrovski commented 7 years ago

Your compressed file worked for me fine as well. Are you using any sort of custom compression library? You use gcc to build iSAAC? Can you please post output of: ldd isaac-align

Roman.

wiedenhoeft commented 7 years ago

I think VarSim uses Java's gzip. However, the 10K file was compressed using a standard gzip 1.4, both with the same result. I'm using gcc. ldd yields:

linux-vdso.so.1 => (0x00007fff437ff000)
libboost_chrono.so.1.55.0 => /projects/gcc48/lib/libboost_chrono.so.1.55.0 (0x00007f662e5f0000)
libboost_atomic.so.1.55.0 => /projects/gcc48/lib/libboost_atomic.so.1.55.0 (0x00007f662e3ed000) libz.so.1 => /projects/gcc48/lib/libz.so.1 (0x00007f662e1d5000) librt.so.1 => /lib64/librt.so.1 (0x0000003d8b200000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003d8aa00000) libstdc++.so.6 => /projects/gcc48/lib64/libstdc++.so.6 (0x00007f662de89000) libm.so.6 => /lib64/libm.so.6 (0x0000003d8a200000) libgomp.so.1 => /projects/gcc48/lib64/libgomp.so.1 (0x00007f662dc7a000) libgcc_s.so.1 => /projects/gcc48/lib64/libgcc_s.so.1 (0x00007f662da64000) libc.so.6 => /lib64/libc.so.6 (0x0000003d89e00000) libboost_system.so.1.55.0 => /projects/gcc48/lib/libboost_system.so.1.55.0 (0x00007f662d860000) /lib64/ld-linux-x86-64.so.2 (0x0000003d89a00000)

rpetrovski commented 7 years ago

Looks reasonable. a bit surprised that your libz does not come out of /lib64. Could you check what's causing that?

I've put a CentOS6 compiled static binary of isaac-align here: https://illumina.box.com/s/uhn6zt1v2g11jjzbyrog07y4d4a0perx.

If you can run it, we will confirm whether the problem is specific to the binaries that you produce

On a separate note, you can work around compression related issues by unpacking your fastq and running with --base-calls-format fastq.

wiedenhoeft commented 7 years ago

I don't know about the libz, tbh. The machine is quite old, I don't know if anyone remembers who installed things how and where ;-)

For the box link I get "The item you are trying to access has either been deleted or is unavailable to you." (I don't have Illumina credentials, so I had to use my work account).

I just tried on uncompressed. At least it's going somewhere, I get "When no sample sheet is provided, 'default' reference must have a specification". I remember seeing this before and fixing it somehow... Unfortunately, I'm planning to run a huge simulation study, and space is limited, so I'd prefer to work on the gzip files :-(

rpetrovski commented 7 years ago

Drop --reference-name ${refname} --sample-sheet none from your command line. They are not going to have any effect and for the reasons too long to explain will give you this error.

Did the static binary work? My best guess is that something is not right with your /projects/gcc48/lib/libz.so.1. you can fiddle with LD_LIBRARY_PATH to force /lib64/libz being used. Or you can build static binary on a different box that would have a working one.

To build static binary all you need to do is to use --static during configure, and of course relevant packages such as zlib-static.

rpetrovski commented 7 years ago

Sorry, forgot to mention that I've updated the access to the box folder: It says you should be able to join as collaborator when you access the folder link: https://illumina.box.com/s/5jwhdmjeei8s5bqw8lxu6bpclgy0bhag

file link: https://illumina.box.com/s/uhn6zt1v2g11jjzbyrog07y4d4a0perx

Let me know if it still does not work.

rpetrovski commented 7 years ago

Ok, just read box policies. Looks like I have to invite you personally with your email address. Maybe it will be easier if you create your own folder and let me drop the binary in it.

Roman.

wiedenhoeft commented 7 years ago

Thanks! Well, we seem to be getting somewhere: it runs, but complains that the index version is to old (version 3, built with iSAAC 1.14.02.10). Following the install instructions, I've tried to build 03.17.03.01 from scratch in order to be able to build the latest index version, but make complains: make: @iSAAC_HOME@@iSAAC_FULL_DATADIR@/makefiles/reference/SortReference.mk: No such file or directory

rpetrovski commented 7 years ago

I've put the entire iSAAC-03.17.03.01-Linux-x86_64.tar.gz package in the box: https://illumina.box.com/s/fmbhhnzyk0xudwwi01g4h5vr84e55zo6

This should be able to properly build the reference and run with your data. Let me know if it does not.

Note that iSAAC-03 requires substantially more RAM to index reference than iSAAC-01. You will need to have about 150G of ram and a couple of days for isaac-sort-reference to complete successfully with defaults.

There are ways to run it on lower-spec hardware. Let me know your hardware spec if above is not available.

Roman.

wiedenhoeft commented 7 years ago

Thanks, I'll check it out. I have 260GB RAM and 48 cores, with several terabytes of disk space, so this should be fine. However, strangely enough, the old iSAAC-01 aiignmer crashes with a bad_alloc for 30x WGS... We'll see how the new one will do ;-)

rpetrovski commented 7 years ago

please make sure you supply a decent -m argument. On 48 cores iSAAC-03 will need -m 70 or so. However certain stages benefit from being able to use as much ram as possible. So, on 260G box I would just use -m 240.