Illumina / Isaac4

Isaac aligner version 4
Other
18 stars 3 forks source link

ISaac install error #3

Closed lw3259111 closed 3 years ago

lw3259111 commented 6 years ago

when i install the up-to-date Isaac4,i have meet a problem. [ 40%] Building C object Utilities/cmlibarchive/libarchive/CMakeFiles/cmlibarchive.dir/archive_cryptor.c.o In file included from /home/software/Isaac42/Isaac-build/bootstrap_cmake/build/cmake-3.7.0-rc2/Utilities/cmlibarchive/libarchive/archive_cryptor.c:32:0: /home/software/Isaac42/Isaac-build/bootstrap_cmake/build/cmake-3.7.0-rc2/Utilities/cmlibarchive/libarchive/archive_cryptor_private.h:107:17: error: field ‘ctx’ has incomplete type EVP_CIPHER_CTX ctx; I have search the error in google, some one think i should change openssl version. when i change the version to 1.1.0g. the software still can't install. so, how can i solve the problem?

lw3259111 commented 6 years ago

I used the old version Issac4 for rice genome is good ,but when i used barley gtf file ,the software can't read the gtf length, 2018-01-31 17:33:15 [7f02b39ed7c0] estimateOptimumFragmentsPerBin availableMemory / fragmentMemoryRequirements / minOverlap: 4247396 2018-01-31 17:33:15 [7f02b39ed7c0] STAT: loadContigs 578519040vm 2406res 2018-01-31 17:33:20 [7f02b39ed7c0] Generated 8 contigs of which 0 are decoys ./work.sh: line 8: 25423 Aborted (core dumped) /home/software/Isaac4/bin/isaac-align -r /data2/database/genome/barley/barley/sorted-reference.xml -b S74 -m 200 --base-calls-format fastq so, there is a bug for long genome file?

rpetrovski commented 6 years ago

Can you please point me to the fasta file you are using? I take you've managed to resolve your cmake compilation problems?

Roman.

lw3259111 commented 6 years ago

@rpetrovski The species is Hordeum vulgare, but we reconstruct the genome by myself. The size of genome of Hordeum vulgare more than 4.6G. and The isaac is make compilation problems.

rpetrovski commented 6 years ago

We have not used Isaac with genomes over 4G. You might get lucky if you change line 129 of reference/Contig.hh from typedef uint32_t Offset; to typedef uint64_t Offset;

I know it compiles and runs with human genome but I don't have any data to check it for your scenario. If you make your fasta available along with a few reads from your data I will try to check and fix any major issues next week.

Roman.

lw3259111 commented 6 years ago

@rpetrovski How can i send fasta to you?

lw3259111 commented 6 years ago

@rpetrovski now, we split fasta by chromosome. when I run the command, some problem happened ERROR: ***** Internal Program Error - assertion (bin < BAM_MAX_BIN) failed in void isaac::bam::BamIndexPart::addToBinIndexChunks(isaac::bam::UnresolvedOffset, isaac::bam::UnresolvedOffset, uint32_t, uint32_t):/home/software/Isaac4/src/c++/lib/bam/BamIndexer.cpp(74): Invalid bin number in uncompressed BAM so, how should i work with the software?

rpetrovski commented 6 years ago

Like I said, I need to be able to reproduce your failures in order to debug them. This requires at least having access to your reference.

rpetrovski commented 6 years ago

Sorry, did not see your other question. AFAIK Illumina does not have any sort of inbox for data files. Would you be able to put it somewhere on the internet? https://www.box.com could be one such place.

Roman.

lw3259111 commented 6 years ago

@rpetrovski Thanks for your suggest, now , i found other problem with create index for bam file ERROR: ***** Internal Program Error - assertion (bin < BAM_MAX_BIN) failed in void isaac::bam::BamIndexPart::addToBinIndexChunks(isaac::bam::UnresolvedOffset, isaac::bam::UnresolvedOffset, uint32_t, uint32_t):/home/software/Isaac4/src/c++/lib/bam/BamIndexer.cpp(74) After i align my reads to genome with bwa, I used samtools v1.6 to create index for sorted bam, the similar problem will happened,but when i use samtools v1.3, i successfully create index . so, whether failure due to the problem of samtools version or embedded code?

rpetrovski commented 6 years ago

@lw3259111, I've put in a branch https://github.com/Illumina/Isaac4/tree/SAAC01326_branch with some changes allowing to run genomes longer than 4 gigabases. The individual contigs still need to be under 512 megabases. This is a prototype. Please let me know if it works for you.

Roman.

lw3259111 commented 6 years ago

@rpetrovski when i install the software in configure step, i meet some problem, /home/software/testIsaac4/Isaac4/Isaac-build/bootstrap_cmake/build/cmake-3.7.0-rc2/Utilities/cmlibarchive/libarchive/archive_cryptor_private.h:107:17: error: field ‘ctx’ has incomplete type EVP_CIPHER_CTX ctx; ^ make[2]: *** [Utilities/cmlibarchive/libarchive/CMakeFiles/cmlibarchive.dir/archive_cryptor.c.o] Error 1 make[1]: *** [Utilities/cmlibarchive/libarchive/CMakeFiles/cmlibarchive.dir/all] Error 2 make: *** [all] Error 2 cmake: build failed: Terminating... Failed to verify or install cmake The problem have been showed in top.

rpetrovski commented 6 years ago

Looks like a know issiue with cmake 3.7.0 (https://gitlab.kitware.com/cmake/cmake/issues/16459) I've updated cmake to the 3.10.2 on the https://github.com/Illumina/Isaac4/tree/SAAC01326_branch. Please let me know if this works for you.

lw3259111 commented 6 years ago

@rpetrovski Thanks, i can successfully install the newer software, but when i start to run the isaac-align, some problem will be happened, 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestCombinationPairInfo_.reserve 9627062272vm 1213571res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestRescuedPair_.reserve 9627062272vm 1213571res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before candidates_.reserve 9627062272vm 1213571res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder after candidates_.reserve 9627062272vm 1213571res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9669804032vm 1214083res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9673003008vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9673003008vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9675137024vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9675137024vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9676734464vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before shadowList_.reserve 9684561920vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestCombinationPairInfo_.reserve 9685557248vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestRescuedPair_.reserve 9685557248vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before candidates_.reserve 9685557248vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder after candidates_.reserve 9685557248vm 1214148res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9728299008vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9731497984vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9731497984vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9733632000vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9733632000vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9735233536vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before shadowList_.reserve 9743056896vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestCombinationPairInfo_.reserve 9744052224vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestRescuedPair_.reserve 9744052224vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before candidates_.reserve 9744052224vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder after candidates_.reserve 9744052224vm 1214726res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9786793984vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 9789997056vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9789997056vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 9792126976vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9792126976vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 9793728512vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before shadowList_.reserve 9801551872vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestCombinationPairInfo_.reserve 9802551296vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before bestRescuedPair_.reserve 9802551296vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder before candidates_.reserve 9802551296vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: TemplateBuilder after candidates_.reserve 9802551296vm 1214791res 2018-02-23 10:46:36 [7fabda4237c0] STAT: Constructed match selector 9802551296vm 1214791res std::bad_alloc

rpetrovski commented 6 years ago

I've tried with fasta from ftp://ftp.ensemblgenomes.org/pub/plants/release-38/fasta/hordeum_vulgare/dna/Hordeum_vulgare.Hv_IBSC_PGSB_v2.dna.toplevel.fa.gz. On a 40-thread box I had to use --memory-limit 80 before I stopped getting bad_alloc. This is normal as 1326-branch version needs more RAM to hold the hash table.

Having said that, the genome I've used will not be able to finish bam generation since most contigs are longer than 536,870,912 bases allowed by bam specification. You will need to make sure your contigs are shorter. On possibility is to break them apart into smaller contigs preferably at the locations that have stretches of Ns.

$ cat IsaacIndex.20180223/sorted-reference.xml |grep Total
\<TotalBases>558535432\</TotalBases> \<TotalBases>768075024\</TotalBases> \<TotalBases>699711114\</TotalBases> \<TotalBases>647060158\</TotalBases> \<TotalBases>670030160\</TotalBases> \<TotalBases>583380513\</TotalBases> \<TotalBases>657224000\</TotalBases> \<TotalBases>249774706\</TotalBases> \<TotalBases>115974\

lw3259111 commented 6 years ago

@rpetrovski ,thanks, the software was successfully run, but when i align my reads, i have meet some problem.The error has detected in old and newer version. ERROR: ***** Internal Program Error - as sertion (header.ID1 == 31U) failed in void isaac::bgzf::validateHeader(const isa ac::bgzf::Header&):/home/software/Isaac42/Isaac4-SAAC01326_branch/src/c++/lib/bg zf/BgzfReader.cpp(34): got 24

rpetrovski commented 6 years ago

Looks like your fastq file is corrupt. Are you able to unpack it fully with gunzip?

lw3259111 commented 6 years ago

@rpetrovski Thanks for your help. Now i can assemble my reads with a big genome.