bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

TypeError: sequence item 0: expected str instance, NoneType found #234

Open moldach opened 3 years ago

moldach commented 3 years ago

This is related to #232 I was getting on my laptop - the "Bio.Alphabet has been removed from Biopython error. You had suggested trying version 2.2.8 so I'm pulling the latest master from github to try.

I got this same error on an HPC; however, I got further on the buildout installation here so I'm opening this ticket (to be clear this is a different HPC from that we are trouble-shooting in #228 & #229).

Install using buildout method

$ export MAVIS_ALIGNER='bwa mem'
$ export MAVIS_ALIGNER_REFERENCE=~/MAVIS/reference_inputs/hg19.fa
$ git clone https://github.com/bcgsc/mavis.git
$ cd mavis
$ pip install zc.buildout
$ python bootstrap.py
$ bin/buildout

Try MAVIS

$  ~/bin/mavis/bin/mavis setup tests/data/pipeline_config.cfg -o output_dir
Traceback (most recent call last):
  File "/home/moldach/bin/mavis/bin/mavis", line 18, in <module>
    import mavis.main
  File "/home/moldach/bin/mavis/mavis/__init__.py", line 6, in <module>
    __version__ = pkg_resources.require('mavis')[0].version
  File "/home/moldach/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 899, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/moldach/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 790, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (networkx 2.4 (/home/moldach/anaconda3/lib/python3.8/site-packages), Requirement.parse('networkx==1.11.0'), {'mavis'})

Seems to be an issue with networkx==1.11.0 so let's install that.

Install

$ pip install networkx==1.11.0

Run again

(base) [moldach@fc1 MAVIS]$ ~/bin/mavis/bin/mavis setup tests/data/pipeline_config.cfg -o output_dir
                      MAVIS: 2.2.7
                      hostname: fc1
[2020-11-20 00:26:35] arguments
                        command = 'setup'
                        config = '/home/moldach/MAVIS/tests/data/pipeline_config.cfg'
                        log = None
                        log_level = 'INFO'
                        output = 'output_dir'
                        skip_stage = []
                      creating output directory: 'output_dir/converted_inputs'
                      setting up the directory structure for mock-A36971 as /home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome
                      converting input command: ['convert_tool_output', '/home/moldach/MAVIS/tests/data/mock_sv_events.tsv', '/home/moldach/MAVIS/tests/data/mock_sv_events.tsv', 'mavis', False]
                      reading: /home/moldach/MAVIS/tests/data/mock_sv_events.tsv
                      generated 28 breakpoint pairs
                      reading: /home/moldach/MAVIS/tests/data/mock_sv_events.tsv
                      generated 28 breakpoint pairs
                      collapsed 56 to 28 calls
                      writing: output_dir/converted_inputs/mock_converted.tab
                      creating output directory: '/home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/cluster'
[2020-11-20 00:26:35] clustering 
[2020-11-20 00:26:35] writing: /home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/MC_mock-A36971_batch-apA5XLjNBrtNqLnrXMcyNu.log
                      creating output directory: '/home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/validate'
                      creating output directory: '/home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/validate/batch-apA5XLjNBrtNqLnrXMcyNu-1'
                      creating output directory: '/home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/validate/batch-apA5XLjNBrtNqLnrXMcyNu-2'
[2020-11-20 00:26:35] writing: /home/moldach/MAVIS/output_dir/mock-A36971_diseased_genome/validate/submit.sh
Traceback (most recent call last):
  File "/home/moldach/bin/mavis/bin/mavis", line 21, in <module>
    sys.exit(mavis.main.main())
  File "/home/moldach/bin/mavis/mavis/main.py", line 600, in main
    raise err
  File "/home/moldach/bin/mavis/mavis/main.py", line 582, in main
    pipeline = _pipeline.Pipeline.build(config)
  File "/home/moldach/bin/mavis/mavis/schedule/pipeline.py", line 432, in build
    pipeline.write_submission_script(
  File "/home/moldach/bin/mavis/mavis/schedule/pipeline.py", line 277, in write_submission_script
    fh.write(' \\\n\t'.join(commands) + '\n\n')
TypeError: sequence item 0: expected str instance, NoneType found
creisle commented 3 years ago

Are you using a virtualenv? version conflicts happen quite often when using the system install. I would edit the following to

$ export MAVIS_ALIGNER='bwa mem' $ export MAVIS_ALIGNER_REFERENCE=~/MAVIS/reference_inputs/hg19.fa $ git clone https://github.com/bcgsc/mavis.git $ cd mavis $ python3 -m venv venv $ source venv/bin/activate $ pip install zc.buildout $ python bootstrap.py $ bin/buildout

trying this on my system I did not encounter any issues

(venv) [creisle04: mavis]$ python3 -m venv venv
(venv) [creisle04: mavis]$ source venv/bin/activate
(venv) [creisle04: mavis]$ pip install zc.buildout
Looking in indexes: https://pypi.bcgsc.ca/gsc/packages/
Collecting zc.buildout
  Downloading https://pypi.bcgsc.ca/root/pypi/+f/088/0f131cd5f3c5f/zc.buildout-2.13.3-py2.py3-none-any.whl (153kB)
    100% |████████████████████████████████| 163kB 36.5MB/s 
Requirement already satisfied: setuptools>=8.0 in ./venv/lib/python3.7/site-packages (from zc.buildout) (40.6.2)
Installing collected packages: zc.buildout
Successfully installed zc.buildout-2.13.3
You are using pip version 18.1, however version 20.3b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
(venv) [creisle04: mavis]$ python bootstrap.py
ez_setup.py is deprecated and when using it setuptools will be pinned to 33.1.1 since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools
Creating directory '/projects/dat/workspace/creisle/tmp/mavis/eggs'.
Creating directory '/projects/dat/workspace/creisle/tmp/mavis/bin'.
Creating directory '/projects/dat/workspace/creisle/tmp/mavis/parts'.
Creating directory '/projects/dat/workspace/creisle/tmp/mavis/develop-eggs'.
Generated script '/projects/dat/workspace/creisle/tmp/mavis/bin/buildout'.
(venv) [creisle04: mavis]$ /projects/dat/workspace/creisle/tmp/mavis/bin/buildout
Develop: '/projects/dat/workspace/creisle/tmp/mavis/.'
warning: no files found matching '*.pl' under directory 'tools'
no previously-included directories found matching 'docs/build'
no previously-included directories found matching 'docs/source/auto'
WARNING: Aligner is required. Missing executable: blat
Getting distribution for 'zc.recipe.egg>=2.0.6'.
Got zc.recipe.egg 2.0.7.
Installing mavis.
Getting distribution for 'svgwrite'.
Got svgwrite 1.4.
Getting distribution for 'shortuuid>=0.5.0'.
Got shortuuid 1.0.1.
Getting distribution for 'pyvcf==0.6.8'.
zip_safe flag not set; analyzing archive contents...
vcf.test.__pycache__.test_vcf.cpython-37: module references __file__
Got PyVCF 0.6.8.
Getting distribution for 'pysam<=0.15.2,>=0.9'.
Got pysam 0.15.2.
Getting distribution for 'numpy>=1.13.1'.
Got numpy 1.19.4.
Getting distribution for 'networkx==1.11.0'.
Got networkx 1.11.
Getting distribution for 'colour'.
Got colour 0.1.5.
Getting distribution for 'braceexpand==0.1.2'.
zip_safe flag not set; analyzing archive contents...
Got braceexpand 0.1.2.
Getting distribution for 'biopython<1.78,>=1.70'.
Got biopython 1.77.
Getting distribution for 'Shapely>=1.6.4.post1'.
Got Shapely 1.7.1.
Getting distribution for 'Distance>=0.1.3'.
notice: no C support available
Warning: 'classifiers' should be a list, got type 'tuple'
zip_safe flag not set; analyzing archive contents...
Got Distance 0.1.3.
Getting distribution for 'decorator>=3.4.0'.
Got decorator 4.4.2.
Generated script '/projects/dat/workspace/creisle/tmp/mavis/bin/calculate_ref_alt_counts'.
Generated script '/projects/dat/workspace/creisle/tmp/mavis/bin/mavis'.
(venv) [creisle04: mavis]$ /projects/dat/workspace/creisle/tmp/mavis/bin/mavis
usage: mavis [-h] [-v]
             {annotate,validate,setup,schedule,cluster,pairing,summary,config,convert,overlay}
             ...
mavis: error: the following arguments are required: command

I do see the sequence error, it results when you don't have mavis on the path so it doesn't know how to build the subcommands. you can fix it by adding mavis to the path first

(venv) [creisle04: mavis]$ export PATH=/projects/dat/workspace/creisle/tmp/mavis/bin:$PATH
(venv) [creisle04: mavis]$ ./bin/mavis setup tests/data/pipeline_config.cfg -o output_dir
                      MAVIS: 2.2.8
                      hostname: creisle04.phage.bcgsc.ca
[2020-11-20 09:59:36] arguments
                        command = 'setup'
                        config = '/projects/dat/workspace/creisle/tmp/mavis/tests/data/pipeline_config.cfg'
                        log = None
                        log_level = 'INFO'
                        output = 'output_dir'
                        skip_stage = []
                      creating output directory: 'output_dir/converted_inputs'
                      setting up the directory structure for mock-A36971 as /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/cluster'
[2020-11-20 09:59:36] clustering 
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/cluster/MC_mock-A36971_batch-65sqBUfUCy6HXUrsuzXAo6.log
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/validate'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/validate/batch-65sqBUfUCy6HXUrsuzXAo6-1'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/validate/batch-65sqBUfUCy6HXUrsuzXAo6-2'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/validate/submit.sh
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/annotate'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/annotate/batch-65sqBUfUCy6HXUrsuzXAo6-1'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/annotate/batch-65sqBUfUCy6HXUrsuzXAo6-2'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A36971_diseased_genome/annotate/submit.sh
                      setting up the directory structure for mock-A47933 as /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/cluster'
[2020-11-20 09:59:36] clustering 
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/cluster/MC_mock-A47933_batch-65sqBUfUCy6HXUrsuzXAo6.log
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/validate'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/validate/batch-65sqBUfUCy6HXUrsuzXAo6-1'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/validate/batch-65sqBUfUCy6HXUrsuzXAo6-2'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/validate/batch-65sqBUfUCy6HXUrsuzXAo6-3'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/validate/submit.sh
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/annotate'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/annotate/batch-65sqBUfUCy6HXUrsuzXAo6-1'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/annotate/batch-65sqBUfUCy6HXUrsuzXAo6-2'
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/annotate/batch-65sqBUfUCy6HXUrsuzXAo6-3'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/mock-A47933_diseased_transcriptome/annotate/submit.sh
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/pairing'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/pairing/submit.sh
                      creating output directory: '/projects/dat/workspace/creisle/tmp/mavis/output_dir/summary'
[2020-11-20 09:59:36] writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/summary/submit.sh
                      writing: /projects/dat/workspace/creisle/tmp/mavis/output_dir/build.cfg
                      run time (hh/mm/ss): 0:00:00
                      run time (s): 0

I will add the documentation flag here tho b/c I think this could definitely have a more helpful error message

moldach commented 3 years ago

Hi I was using a venv but I think the PATH was only one issue here.

$ export MAVIS_ALIGNER='bwa mem'
$ export MAVIS_ALIGNER_REFERENCE=~/MAVIS/reference_inputs/hg19.fa
$ git clone https://github.com/bcgsc/mavis.git
$ cd mavis
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install zc.buildout
$ python bootstrap.py
$ bin/buildout

running

I will start an interactive session with salloc, source venv, export PATH and try to run the example:

$ salloc --time=1:0:0 --mem=12000
$ export PATH=~/bin/mavis/bin:$PATH
$ export MAVIS_ALIGNER='bwa mem'
$ export MAVIS_ALIGNER_REFERENCE=~/MAVIS/reference_inputs/hg19.fa

$ export MAVIS_SCHEDULER=SLURM
$ export MAVIS_CONCURRENCY_LIMIT=2
$ ./bin/mavis setup tests/data/pipeline_config.cfg -o output_dir
$ ./bin/mavis/schedule -o output_dir
$ ./bin/mavis schedule -o output_dir --submit

After a while I check to see if they ran and I see failures.

Looking in one of the logs I see the failure is caused due to blat:

Traceback (most recent call last):
  File "/home/moldach/bin/mavis/bin/mavis", line 24, in <module>
    sys.exit(mavis.main.main())
  File "/home/moldach/bin/mavis/mavis/main.py", line 465, in main
    args.aligner_version = get_aligner_version(args.aligner)
  File "/home/moldach/bin/mavis/mavis/align.py", line 174, in get_aligner_version
    raise ValueError("unable to parse blat version number from:'{}'".format(proc))
ValueError: unable to parse blat version number from:'/bin/sh: blat: command not found'

This is odd as export MAVIS_ALIGNER='bwa mem' was used before creating the venv and the installation?

calchoo commented 3 years ago

The test pipeline_config.cfg has the aligner set to blat so mavis won't use the environment variable in this case.

Can you try it again using blat as the aligner?