Illumina / Isaac3

Aligner for sequencing data
Other
18 stars 2 forks source link

[help] the correct way to use paired fastq.gz as input. #8

Closed LuyiTian closed 7 years ago

LuyiTian commented 7 years ago

Hi,

I start alignment using Isaac3. But it keep give me errors:

Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: Could not find any fastq lanes in: "/datadir"

I use docker to run the isaac3 and and command looks like: docker run --rm -v ..../test_illumina_pipeline/WGC087349D:/out_dir \ -v .../ftp.broadinstitute.org/b37/issac3_index:/ref \ -v .../WGC087349D:/datadir \ -w /out_dir \ isaac3:03.16.12.05 isaac-align \ -b /datadir \ -m 40 \ --base-calls-format fastq-gz \ -r /ref \ --enable-numa \ -j 4 \ -o /out_dir more specifically this contains the command I used: https://github.com/LuyiTian/NGS_docker/blob/master/pipeline/pipe_issac3.py

my fastq.gz file (XXX_combined_R1.fastq.gz,XXX_combined_R2.fastq.gz) does not follow the specified format so i simlinked them to lane1_read1.fastq.gz and lane1_read2.fastq.gz in the same folder, but I still got the error. I am not sure if this is the right way because my fastq file contains all lanes.

Another question is about isaac3 index. It takes about 1T space to store the index and most of them are on /Temp folder. Can I safely delete the folder after I built the index? Also I think it is worth to mention in the Readme that to build human reference you need to prepare at least 1T space. It took more than 2 days to build on our 32 core server (with -j=1).

Kind Regards, Luyi

rpetrovski commented 7 years ago

The command line seems ok for fastq bit. Can you please check if the symlinks are indeed where you think the are.

The reference cannot possibly work the way you specify it. -r is supposed to point at the sorted-reference.xml file.

You can delete Temp safely if you have successfully produced sorted-refererence.xml.

Roman.

On 14 Jan 2017 10:22, "Luyi Tian" notifications@github.com wrote:

Hi,

I start alignment using Isaac3. But it keep give me errors:

Dynamic exception type: boost::exception_detail::clone_implisaac::common:: InvalidOptionException std::exception::what: Could not find any fastq lanes in: "/datadir"

I use docker to run the isaac3 and and command looks like: docker run --rm -v ..../test_illumina_pipeline/WGC087349D:/out_dir -v .../ftp.broadinstitute.org/b37/issac3_index:/ref \ -v .../WGC087349D:/datadir -w /out_dir isaac3:03.16.12.05 isaac-align -b /datadir -m 40 --base-calls-format fastq-gz -r /ref --enable-numa \ -j 4 -o /out_dir more specifically this contains the command I used: https://github.com/LuyiTian/NGS_docker/blob/master/pipeline/pipe_issac3.py

my fastq.gz file (XXX_combined_R1.fastq.gz,XXX_combined_R2.fastq.gz) does not follow the specified format so i simlinked them to lane1_read1.fastq.gz and lane1_read2.fastq.gz in the same folder, but I still got the error. I am not sure if this is the right way because my fastq file contains all lanes.

Another question is about isaac3 index. It takes about 1T space to store the index and most of them are on /Temp folder. Can I safely delete the folder after I built the index? Also I think it is worth to mention in the Readme that to build human reference you need to prepare at least 1T space. It took more than 2 days to build on our 32 core server (with -j=1).

Kind Regards, Luyi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Illumina/Isaac3/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8scXYEmjBeQvHlYX3I9IsmdASi7Ka-ks5rSKHWgaJpZM4LjljA .

LuyiTian commented 7 years ago

Hi,

I solved that issue. I think it is due the the simlink problem because docker creates virtual path. I used mv to change the file name directly and this time it works with the command:

docker run --rm -v .../test_illumina_pipeline/WGC087349D:/out_dir -v /data/database/pub/ftp.broadinstitute.org/b37:/ref -v .../WGC087349D:/datadir -w /out_dir isaac3:03.16.12.05 isaac-align -b /datadir -m40 --base-calls-format fastq-gz --lane-number-max 1 -r /ref/issac3_index/sorted-reference.xml -j 4 -o /out_dir

But another problem occurs. The program runs for several minutes and report an error:

with --enable-numa :

2017-01-16 03:13:34 [7f3d16c25780] STAT: MatchSelectorStats::MatchSelectorStats 19244539904vm 4546052res 2017-01-16 03:13:34 [7f3d16c25780] Allocating 4 tile stats. 2017-01-16 03:13:34 [7f3d16c25780] Allocating 4 tile stats done. Total size is 458880 bytes. 2017-01-16 03:13:34 [7f3d16c25780] Allocating 4 tile barcode stats. 2017-01-16 03:13:34 [7f3d16c25780] Allocating 4 tile barcode stats done. Total size is 1184 bytes. 2017-01-16 03:13:34 [7f3d16c25780] STAT: MatchSelectorStats::MatchSelectorStats constructed19244539904vm 4546052res 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 40000 bytes on node -1 for type j 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 40000 bytes on node -1 for type j 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 40000 bytes on node -1 for type j 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 40000 bytes on node -1 for type j 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 480 bytes on node -1 for type N5isaac9alignment16FragmentMetadataE 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 480 bytes on node -1 for type N5isaac9alignment16FragmentMetadataE 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 480 bytes on node -1 for type N5isaac9alignment16FragmentMetadataE 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 480 bytes on node -1 for type N5isaac9alignment16FragmentMetadataE 2017-01-16 03:13:34 [7f3d16c25780] Constructed the match selector 2017-01-16 03:13:34 [7f3d16c25780] maskWidth_:0 2017-01-16 03:13:34 [7f3d16c25780] oligo::KmerTraits::KMERBITS - maskWidth:32 2017-01-16 03:13:34 [7f3d16c25780] mask:0 2017-01-16 03:13:34 [7f3d16c25780] msbMask:AAAAAAAAAAAAAAAA 2017-01-16 03:13:34 [7f3d16c25780] numaAllocate allocated 17179869184 bytes on node 0 for type j 2017-01-16 03:13:42 [7f3d16c25780] Constructing ReferenceHasher: for 16-mers 2017-01-16 03:16:12 [7f3d16c25780] found 2864779471 positions on 4 threads mbind: Input/output error

without --enable-numa:

2017-01-16 03:21:00 [7fdaa075f780] Aligner: adding base-calls path "/datadir" 2017-01-16 03:21:00 [7fdaa075f780] STAT: MatchSelectorStats::MatchSelectorStats 9930416128vm 2273977res 2017-01-16 03:21:00 [7fdaa075f780] Allocating 4 tile stats. 2017-01-16 03:21:00 [7fdaa075f780] Allocating 4 tile stats done. Total size is 458880 bytes. 2017-01-16 03:21:00 [7fdaa075f780] Allocating 4 tile barcode stats. 2017-01-16 03:21:00 [7fdaa075f780] Allocating 4 tile barcode stats done. Total size is 1184 bytes. 2017-01-16 03:21:00 [7fdaa075f780] STAT: MatchSelectorStats::MatchSelectorStats constructed9930874880vm 2274109res 2017-01-16 03:21:00 [7fdaa075f780] Constructed the match selector 2017-01-16 03:21:00 [7fdaa075f780] maskWidth_:0 2017-01-16 03:21:00 [7fdaa075f780] oligo::KmerTraits::KMERBITS - maskWidth:32 2017-01-16 03:21:00 [7fdaa075f780] mask:0 2017-01-16 03:21:00 [7fdaa075f780] msbMask:AAAAAAAAAAAAAAAA 2017-01-16 03:21:08 [7fdaa075f780] Constructing ReferenceHasher: for 16-mers 2017-01-16 03:23:41 [7fdaa075f780] found 2864779471 positions on 4 threads std::bad_alloc

it happens in the same positions. but my server have 250G RAM and the 1.8T disk space, and the docker runs on root. So I am confused why it fails to allocate the memory. Do you have any suggestions?

LuyiTian commented 7 years ago

The error is solved by chenge -j 4 to -j 1. So does the -m40 controls the RAM used per thread and add thread will increase it? Even I set -j 1 -m40, the program still uses ~55G RAM, which is pretty large compared with other aligners.

rpetrovski commented 7 years ago

iSAAC-03 needs quite a bit of ram for linear reference, hash table and genome k-uniqueness annotation. Then there is per-thread RAM as well. You should be safe with -m70 on 32 threads

Given your hardware, I don't see a need to limit the threading in iSAAC. if you don't specify -j it will use whatever the system has.

--enable-numa will create hash table and reference replica on each numa node to impove the locality of memory access. This means you will need bigger -m.

iSAAC considers 1M==1024K 102410241024*40 gives 42949672960. -m is essentially a ulimit -v call from inside of iSAAC. iSAAC cannot possibly go over that. Where do you see the 55G number?

Roman.