Illumina / isaac2

Aligner for sequencing data
Other
21 stars 4 forks source link

contigs.xml does not support seed length 32 #14

Closed lindenb closed 8 years ago

lindenb commented 8 years ago

Hi all, I've indexed GRCh38

isaac-sort-reference -g GRCh38.fa -o path/GRCh38 -j 15

Then, when I try to align a pair of fastqs, I get the following error

isaac-align  -r path/GRCh38/Temp/contigs.xml -b my/path/FASTQ -f fastq-gz  -o OUT -m 15 -j 15 --seed-length 32

2016-05-24 12:05:50     [2b2ba31566a0]  Discovered data read: ReadMetadata(1, 100 [1, 100], 0id, 0off,1frc)
2016-05-24 12:05:50     [2b2ba31566a0]  Discovered data read: ReadMetadata(2, 100 [102, 201], 1id, 100off,102frc)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed extremity seed SeedMetadata(0, 32, 0, 0)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed extremity seed SeedMetadata(68, 32, 0, 1)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed SeedMetadata(32, 32, 0, 2)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed overlapping SeedMetadata(16, 32, 0, 3)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed extremity seed SeedMetadata(0, 32, 1, 4)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed extremity seed SeedMetadata(68, 32, 1, 5)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed SeedMetadata(32, 32, 1, 6)
2016-05-24 12:05:50     [2b2ba31566a0]  constructed overlapping SeedMetadata(16, 32, 1, 7)
2016-05-24 12:05:50     [2b2ba31566a0]  Generated 'none' barcode: BarcodeMetadata(7,1,default,none,(0), 4294967295)
2016-05-24 12:05:50     [2b2ba31566a0]  align: Setting memory limit to 16106127360 bytes.
Error: 2016-May-24 12:05:51: Invalid argument: /ccc/work/cont007/fg0019/lindenbp/packages/isaac/isaac2-iSAAC-02.16.03.09/src/c++/lib/workflow/AlignWorkflow.cpp(232): Throw in function static isaac::reference::SortedReferenceMetadataList isaac::workflow::AlignWorkflow::loadSortedReferenceXml(unsigned int, const ReferenceMetadataList&)
Dynamic exception type: boost::exception_detail::clone_impl<isaac::common::PreConditionException>
std::exception::what: Sorted reference /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/contigs.xml does not support seed length 32
: Sorted reference /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/contigs.xml does not support seed length 32
Makefile:6: recipe for target '/ccc/scratch/cont007/fg0019/lindenbp/20160524.ISAAC2/B00GG8X/BAM/B00GG8X.bam' failed
make: *** [/ccc/scratch/cont007/fg0019/lindenbp/20160524.ISAAC2/B00GG8X/BAM/B00GG8X.bam] Error 1

I tried to remove '--seed-length 32', to change the seed to 16, 32,64, I'm always getting the same kind of error.

$ head /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/contigs.xml
<?xml version="1.0"?>
<SortedReference>
  <FormatVersion>6</FormatVersion>
  <SoftwareVersion>iSAAC-02.16.03.09</SoftwareVersion>
  <Contigs>
    <Contig Position="0">
      <Index>0</Index>
      <KaryotypeIndex>0</KaryotypeIndex>
      <Name>chr1</Name>
      <Sequence>
(...)

In the simple example I've found, there is no 'seed'. where am i wrong ? thanks

rpetrovski commented 8 years ago

you've pointed isaac-align to the wrong xml. Please use isaac-align -r path/GRCh38/sorted-reference.xml -b my/path/FASTQ -f fastq-gz -o OUT -m 15 -j 15 --seed-length 32

lindenb commented 8 years ago

Thank you for the quick answer.

There was no contigs.xml under path/GRCh38....

Ah I see ! there is a Make error in my logs for isaac-sort-reference (recipe failed)

(...)
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/common/../../../../libexec/iSAAC-02.16.03.09/sortReference -r Temp/contigs.xml --mask-width 0 --mask 0 \
    --seed-length 68 \
    --output-file /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-68-0.dat \
    --repeat-threshold 0 >/ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-68-0.xml.tmp && mv /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-68-0.xml.tmp /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-68-0.xml
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/common/../../../../libexec/iSAAC-02.16.03.09/sortReference -r Temp/contigs.xml --mask-width 0 --mask 0 \
    --seed-length 72 \
    --output-file /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-72-0.dat \
    --repeat-threshold 0 >/ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-72-0.xml.tmp && mv /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-72-0.xml.tmp /ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-72-0.xml
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/reference/FindNeighbors.mk:54: recipe for target '/ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-48-0.xml' failed
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/reference/FindNeighbors.mk:54: recipe for target '/ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-40-0.xml' failed
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/reference/FindNeighbors.mk:54: recipe for target '/ccc/work/cont007/fg0019/lindenbp/packages/GRCh38/Temp/neighbor-positions-36-0.xml' failed
(...)
rpetrovski commented 8 years ago

Looks like you did not go very far with the reference sorting. The likely reasons of failure for reference sorting are: a) running out of memory for long seeds b) running out of disk space

What sort of hardware you are trying to run this on?

lindenb commented 8 years ago

I'm using the French Curie supercomputer ( http://www-hpc.cea.fr/en/complexe/tgcc-curie.htm ) under slurm. I don't know much about architecture but I suppose it should ok.

I'm currently trying to re-index my reference using another large (speed++ and disk-space++) partition. I'll be back to you.

rpetrovski commented 8 years ago

Please make sure the job is allowed at least 150G of ram with defaults. There are ways to reduce this but if you have the memory, defaults are the fastest way.

On a separate topic: iSAAC-02 is EOL. Would you consider using iSAAC-03 instead?

lindenb commented 8 years ago

Thanks, if sort-reference/isaac2 fails, I'll switch to isaac3.

rpetrovski commented 8 years ago

Isaac2 is EOL. Please use Isaac3.