benedictpaten / marginAlign

UCSC Nanopore
MIT License
43 stars 13 forks source link

"make test" failed with 6 Errors all for testMarginAlign* #28

Closed danarte closed 7 years ago

danarte commented 7 years ago

It seems all the tests for alignment failed with a similar error:

======================================================================
ERROR: testMarginAlignBwa (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/tests.py", line 129, in testMarginAlignBwa
    self.runMarginAlign(self.readFastqFile1, self.referenceFastaFile1, "--bwa")
  File "./tests/tests.py", line 88, in runMarginAlign
    readAlignmentStats = self.validateSam(self.outputSamFile, readFastqFile, referenceFastaFile)
  File "./tests/tests.py", line 60, in validateSam
    referenceFastaFile, globalAlignment=True)
  File "/genomicslab/datasets/volume2/shomron_home_p2/artemd/tools/marginAlign/src/margin/utils.py", line 379, in getReadAlignmentStats
    refSequences[sam.getrname(aR.rname)], aR, globalAlignment), samIterator(sam))
  File "/genomicslab/datasets/volume2/shomron_home_p2/artemd/tools/marginAlign/src/margin/utils.py", line 379, in <lambda>
    refSequences[sam.getrname(aR.rname)], aR, globalAlignment), samIterator(sam))
  File "/genomicslab/datasets/volume2/shomron_home_p2/artemd/tools/marginAlign/src/margin/utils.py", line 308, in __init__
    for aP in AlignedPair.iterator(alignedRead, self.refSeq, self.readSeq):
  File "/genomicslab/datasets/volume2/shomron_home_p2/artemd/tools/marginAlign/src/margin/utils.py", line 274, in iterator
    if aP.getReadBase().upper() != alignedSegment.query_alignment_sequence[readPos].upper():
IndexError: string index out of range

I have no idea what might cause the error.

Information about the system: Version of bwa "0.7.15-r1142-dirty" Version of lastal "lastal 801" List of python packages installed: ['atlas==0.27.0', 'avro==1.8.1', 'biopython==1.64', 'bx-python==0.7.3', 'certifi==2016.2.28', 'cffi==1.7.0', 'click==6.6', 'cmd2==0.6.8', 'cnvkit==0.7.11', 'conversion==0.0.3', 'crossmap==0.2.3', 'cryptography==1.4', 'cycler==0.10.0', 'cython==0.21.1', 'docopt==0.6.2', 'ecdsa==0.13', 'enum34==1.1.6', 'fabric==1.12.0', 'flask==0.10.1', 'functools32==3.2.3.post2', 'future==0.15.2', 'gdc-client==1.0.1', 'h5py==2.6.0', 'htseq==0.6.1', 'idna==2.1', 'intervaltree==2.0.4', 'ipaddress==1.0.16', 'iso8601==0.1.11', 'itsdangerous==0.24', 'jinja2==2.8', 'jsonschema==2.5.1', 'lifelines==0.9.2', 'lxml==3.5.0b1', 'macs2==2.0.10.20131216', 'markupsafe==0.23', 'matplotlib==1.5.1', 'misopy==0.5.3', 'mpi4py==2.0.0', 'multiqc==0.8', 'myriad==0.1.3', 'mysql-python==1.2.5', 'nanonet==2.0.0', 'nanoraw==0.2', 'ndg-httpsclient==0.4.2', 'numpy==1.11.2', 'pandas==0.18.1', 'paramiko==1.17.2', 'parcel==0.1.13', 'pathoscope==2.0.6', 'pbalign==0.3.0', 'pbcommand==0.3.29', 'pbcore==1.2.10', 'pillow==3.2.0', 'pip==9.0.1', 'plotly==1.12.5', 'poreseq==0.1', 'poretools==0.6.0', 'progressbar==2.3', 'pyasn1==0.1.9', 'pycparser==2.14', 'pycrypto==2.6.1', 'pyfaidx==0.4.7.1', 'pyinstaller==3.2', 'pyopenssl==16.0.0', 'pyparsing==2.1.5', 'pysam==0.9.1.4', 'pysqlite==2.6.3', 'python-dateutil==2.5.3', 'pytz==2016.6.1', 'pyvcf==0.6.8', 'pyyaml==3.11', 'reportlab==3.3.0', 'requests==2.5.1', 'rpy2==2.8.3', 'rseqc==2.6.4', 'scipy==0.18.1', 'seaborn==0.7.1', 'setuptools==19.2', 'simplejson==3.10.0', 'singledispatch==3.4.0.3', 'six==1.10.0', 'sortedcontainers==1.5.3', 'tabulate==0.7.5', 'termcolor==1.1.0', 'theano==0.8.2', 'threadpool==1.2.7', 'tqdm==4.8.4', 'urllib3==1.16', 'virtualenv==15.0.2', 'werkzeug==0.11.10', 'xmlbuilder==1.0']

mitenjain commented 7 years ago

Hello,

This seems like a pysam version error. marginAlign uses pysam==0.8.2.1 (other requirements here: https://github.com/benedictpaten/marginAlign/blob/master/requirements.txt). Are you using virtualenv?

Please let us know if using virtualenv or appropriate requirements help fix the issue. We are in process of containerizing all of the requirements which will be done in a few days.

Sorry for the hassle. Best regards, Miten

danarte commented 7 years ago

I was able to resolve the issue with pysam and now when I run the tests everything is OK.

But, now I have this problem when I'm trying to align some test files:

~/tools/marginAlignWorkingCopy/marginAlign fastq/failpass.poretools.all.fastq ../LambdaRefGenome.fa test.sam --jobTree ./testTree
The job seems to have left a log file, indicating failure: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job
Reporting file: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/log.txt
log.txt:        ---JOBTREE SLAVE OUTPUT LOG---
log.txt:        Traceback (most recent call last):
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/src/jobTreeSlave.py", line 271, in main
log.txt:            defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/scriptTree/stack.py", line 153, in execute
log.txt:            self.target.run()
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/scriptTree/target.py", line 197, in run
log.txt:            func(*((self,) + tuple(self.args)), **self.kwargs)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 275, in realignSamFileTargetFn
log.txt:            chainSamFile(samFile, tempSamFile, readFastqFile, referenceFastaFile, chainFn)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 188, in chainSamFile
log.txt:            refSeq, readSeq), refSeq, readSeq))
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 77, in mergeChainedAlignedSegments
log.txt:            assert pPos <= len(refSequence)
log.txt:        AssertionError
log.txt:        Exiting the slave because of a failed job on host compute-0-12.local
log.txt:        Due to failure we are reducing the remaining retry count of job /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job to 0
log.txt:        We have set the default memory of the failed job to 2147483648 bytes
Job: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job is completely failed
Traceback (most recent call last):
  File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlign.py", line 91, in <module>
    main()
  File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlign.py", line 87, in main
    raise RuntimeError("Got failed jobs")
RuntimeError: Got failed jobs
mitenjain commented 7 years ago

What is probably happening is an index error arising due to the lack of a terminal space/newline character. Could you insert a new line at the end of the reference fasta sequence and try again?

danarte commented 7 years ago

Strange, I really didn't have a terminating newline character in the fasta but when I added a newline and tried running marginAlign with default settings it still didn't work with the same error. I added --bwa and the software worked without errors. Then I tried running with --bwa on the original reference (without a newline) and got this error:

~/tools/marginAlignWorkingCopy/marginAlign fastq/failpass.poretools.all.fastq ../LambdaRefGenome.fa test.sam --bwa --jobTree ./testTree
The job seems to have left a log file, indicating failure: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job
Reporting file: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/log.txt
log.txt:        ---JOBTREE SLAVE OUTPUT LOG---
log.txt:        [bwa_index] Pack FASTA... 0.00 sec
log.txt:        [bwa_index] Construct BWT for the packed sequence...
log.txt:        [bwa_index] 0.01 seconds elapse.
log.txt:        [bwa_index] Update BWT... 0.00 sec
log.txt:        [bwa_index] Pack forward-only FASTA... 0.00 sec
log.txt:        [bwa_index] Construct SA from BWT and Occ... 0.01 sec
log.txt:        [main] Version: 0.7.12-r1044
log.txt:        [main] CMD: bwa index /tmp/tmpUoNP_F/localTempDir/ref.fa
log.txt:        [main] Real time: 0.294 sec; CPU: 0.042 sec
log.txt:        [M::bwa_idx_load_from_disk] read 0 ALT contigs
log.txt:        [M::process] read 1656 sequences (2994902 bp)...
log.txt:        [M::mem_process_seqs] Processed 1656 reads in 15.952 CPU sec, 15.951 real sec
log.txt:        [main] Version: 0.7.12-r1044
log.txt:        [main] CMD: bwa mem -x ont2d /tmp/tmpUoNP_F/localTempDir/ref.fa fastq/failpass.poretools.all.fastq
log.txt:        [main] Real time: 16.077 sec; CPU: 15.990 sec
log.txt:        Traceback (most recent call last):
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/src/jobTreeSlave.py", line 271, in main
log.txt:            defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/scriptTree/stack.py", line 153, in execute
log.txt:            self.target.run()
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/submodules/jobTree/scriptTree/target.py", line 197, in run
log.txt:            func(*((self,) + tuple(self.args)), **self.kwargs)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 275, in realignSamFileTargetFn
log.txt:            chainSamFile(samFile, tempSamFile, readFastqFile, referenceFastaFile, chainFn)
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 188, in chainSamFile
log.txt:            refSeq, readSeq), refSeq, readSeq))
log.txt:          File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlignLib.py", line 77, in mergeChainedAlignedSegments
log.txt:            assert pPos <= len(refSequence)
log.txt:        AssertionError
log.txt:        Exiting the slave because of a failed job on host compute-0-13.local
log.txt:        Due to failure we are reducing the remaining retry count of job /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job to 0
log.txt:        We have set the default memory of the failed job to 2147483648 bytes
Job: /genomicslab/nobackup/volume2/artemd/minion/third_lambda_run/reads/DiffExtractionAndAligners/testTree/jobs/job is completely failed
Traceback (most recent call last):
  File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlign.py", line 91, in <module>
    main()
  File "/groups/nshomron/artemd/tools/marginAlignWorkingCopy/src/margin/marginAlign.py", line 87, in main
    raise RuntimeError("Got failed jobs")
RuntimeError: Got failed jobs

I was able to map with bwa and last separately with the original fasta reference (without the terminating newline) so I don't understand what is causing the problem...

mitenjain commented 7 years ago

This is strange. I suspect there is still an odd behavior due to fasta and pysam. Are you using virtualenv for marginAlign?

Could you send a few of your reads and reference so I can reproduce this at our end? My email is miten@soe.ucsc.edu

Apologies for the hassle.

mitenjain commented 7 years ago

Hi Artem, I recommend installing your own pip and virtualenv on server since that gives a lot of control. Let us know if you still get issues once your IT guys have installed proper python and dependencies on the server. Best regards, Miten and Benedict