bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

"Bio.Alphabet has been removed from Biopython #232

Closed moldach closed 3 years ago

moldach commented 3 years ago

Unable to run MAVIS on an HPC I've now resulted to trying to run it on my laptop.

However, I get an error about Bio.Alphabet for the recommend install

Recommended

mtg@mtg-ThinkPad-P53:~$ export MAVIS_ALIGNER='bwa mem'
mtg@mtg-ThinkPad-P53:~$ export MAVIS_ALIGNER_REFERENCE=c_elegans.PRJNA13758.WS265.genomic.fa
mtg@mtg-ThinkPad-P53:~$ python3 -m venv ~/bin/mavis_venv
mtg@mtg-ThinkPad-P53:~$ source ~/bin/mavis_venv/bin/activate
(mavis_venv) mtg@mtg-ThinkPad-P53:~$ pip install mavis
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ git clone https://github.com/bcgsc/mavis.git
Cloning into 'mavis'...
remote: Enumerating objects: 113, done.
remote: Counting objects: 100% (113/113), done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 11236 (delta 52), reused 65 (delta 34), pack-reused 11123
Receiving objects: 100% (11236/11236), 20.08 MiB | 967.00 KiB/s, done.
Resolving deltas: 100% (8358/8358), done.
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ git checkout v2.0.0
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ mv mavis/tests .
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ rm -r mavis
rm: remove write-protected regular file 'mavis/.git/objects/pack/pack-f7d294de36fdd14bfc56116a0088255396fe0a65.pack'? y
rm: remove write-protected regular file 'mavis/.git/objects/pack/pack-f7d294de36fdd14bfc56116a0088255396fe0a65.idx'? y
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ export MAVIS_SCHEDULER=LOCAL
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ export MAVIS_CONCURRENCY_LIMIT=2

(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~$ mavis setup tests/data/pipeline_config.cfg -o output_dir
Traceback (most recent call last):
  File "/home/mtg/bin/mavis_venv/bin/mavis", line 7, in <module>
    from mavis.main import main
  File "/home/mtg/bin/mavis_venv/lib/python3.6/site-packages/mavis/main.py", line 12, in <module>
    from .align import get_aligner_version
  File "/home/mtg/bin/mavis_venv/lib/python3.6/site-packages/mavis/align.py", line 13, in <module>
    from .bam import cigar as _cigar
  File "/home/mtg/bin/mavis_venv/lib/python3.6/site-packages/mavis/bam/cigar.py", line 7, in <module>
    from ..constants import CIGAR, DNA_ALPHABET, GAP
  File "/home/mtg/bin/mavis_venv/lib/python3.6/site-packages/mavis/constants.py", line 8, in <module>
    from Bio.Alphabet import Gapped
  File "/home/mtg/bin/mavis_venv/lib/python3.6/site-packages/Bio/Alphabet/__init__.py", line 21, in <module>
    "Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Alternate method (Github)

mtg@mtg-ThinkPad-P53:~/bin$ git clone https://github.com/bcgsc/mavis.git
Cloning into 'mavis'...
remote: Enumerating objects: 113, done.
remote: Counting objects: 100% (113/113), done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 11236 (delta 52), reused 65 (delta 34), pack-reused 11123
Receiving objects: 100% (11236/11236), 20.08 MiB | 709.00 KiB/s, done.
Resolving deltas: 100% (8358/8358), done.
mtg@mtg-ThinkPad-P53:~/bin$ cd mavis
mtg@mtg-ThinkPad-P53:~/bin/mavis$ pip install zc.buildout
Collecting zc.buildout
  Downloading zc.buildout-2.13.3-py2.py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 987 kB/s 
Requirement already satisfied: setuptools>=8.0 in /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages (from zc.buildout) (28.8.0)
Installing collected packages: zc.buildout
Successfully installed zc.buildout-2.13.3
WARNING: You are using pip version 20.1.1; however, version 20.2.4 is available.
You should consider upgrading via the '/home/mtg/.pyenv/versions/3.6.0/bin/python3.6 -m pip install --upgrade pip' command.
mtg@mtg-ThinkPad-P53:~/bin/mavis$ python bootstrap.py
ez_setup.py is deprecated and when using it setuptools will be pinned to 33.1.1 since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools

The required version of setuptools (>=33.1.1) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U setuptools'.

(Currently using setuptools 28.8.0 (/home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages))
mtg@mtg-ThinkPad-P53:~/bin/mavis$ easy_install -U setuptools
Searching for setuptools
Reading https://pypi.python.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/a7/e0/30642b9c2df516506d40b563b0cbd080c49c6b3f11a70b4c7a670f13a78b/setuptools-50.3.2.zip#sha256=ed0519d27a243843b05d82a5e9d01b0b083d9934eaa3d02779a23da18077bd3c
Best match: setuptools 50.3.2
Processing setuptools-50.3.2.zip
Writing /tmp/easy_install-infb81as/setuptools-50.3.2/setup.cfg
Running setuptools-50.3.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-infb81as/setuptools-50.3.2/egg-dist-tmp-tnl3gg29
warning: no files found matching '*.py' under directory 'tests'
warning: no files found matching 'Makefile' under directory 'docs'
warning: no previously-included files found matching 'pyproject.toml'
warning: install_lib: 'build/lib' does not exist -- no Python modules to install

zip_safe flag not set; analyzing archive contents...
Moving UNKNOWN-0.0.0-py3.6.egg to /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages
Adding UNKNOWN 0.0.0 to easy-install.pth file
Installing easy_install script to /home/mtg/.pyenv/versions/3.6.0/bin
Installing easy_install-3.6 script to /home/mtg/.pyenv/versions/3.6.0/bin

Installed /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages/UNKNOWN-0.0.0-py3.6.egg
Skipping dependencies for UNKNOWN 0.0.0
mtg@mtg-ThinkPad-P53:~/bin/mavis$ python bootstrap.py
ez_setup.py is deprecated and when using it setuptools will be pinned to 33.1.1 since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools

The required version of setuptools (>=33.1.1) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U setuptools'.

(Currently using setuptools 28.8.0 (/home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages))

Versions (please complete the following information):

Additional context Add any other context about the problem here.

creisle commented 3 years ago

@moldach can you include the pip freeze list for the python instance you are using? this will help debug dependency errors

moldach commented 3 years ago

Sure thing:

(base) mtg@mtg-ThinkPad-P53:~/svviz2$ source ~/bin/mavis_venv/bin/activate
(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~/svviz2$ pip freeze
biopython==1.78
braceexpand==0.1.2
colour==0.1.5
decorator==4.4.2
Distance==0.1.3
mavis==2.2.7
networkx==1.11
numpy==1.19.4
pysam==0.15.2
PyVCF==0.6.8
Shapely==1.7.1
shortuuid==1.0.1
svgwrite==1.4
creisle commented 3 years ago

ok so it looks like @calchoo had addressed this in the last patch release

see https://github.com/bcgsc/mavis/blob/develop/setup.py#L77

if you update to 2.2.8 and re-install you should no longer see the issue

moldach commented 3 years ago

Hi @creisle

I assume you mean via the buildout method and not via pip as that still installs 2.2.7?

Trying to get 2.2.8 via the buildout method:

(base) mtg@mtg-ThinkPad-P53:~/bin$ git clone https://github.com/bcgsc/mavis.git
Cloning into 'mavis'...
remote: Enumerating objects: 143, done.
remote: Counting objects: 100% (143/143), done.
remote: Compressing objects: 100% (98/98), done.
remote: Total 11266 (delta 61), reused 86 (delta 43), pack-reused 11123
Receiving objects: 100% (11266/11266), 20.10 MiB | 1.65 MiB/s, done.
Resolving deltas: 100% (8367/8367), done.
(base) mtg@mtg-ThinkPad-P53:~/bin$ cd mavis
(base) mtg@mtg-ThinkPad-P53:~/bin/mavis$ pip install zc.buildout
Requirement already satisfied: zc.buildout in /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages (2.13.3)
Requirement already satisfied: setuptools>=8.0 in /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages (from zc.buildout) (28.8.0)
WARNING: You are using pip version 20.1.1; however, version 20.2.4 is available.
You should consider upgrading via the '/home/mtg/.pyenv/versions/3.6.0/bin/python3.6 -m pip install --upgrade pip' command.
(base) mtg@mtg-ThinkPad-P53:~/bin/mavis$ python bootstrap.py
ez_setup.py is deprecated and when using it setuptools will be pinned to 33.1.1 since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools

The required version of setuptools (>=33.1.1) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U setuptools'.

(Currently using setuptools 28.8.0 (/home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages))
(base) mtg@mtg-ThinkPad-P53:~/bin/mavis$ easy_install -U setuptools
Searching for setuptools
Reading https://pypi.python.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/a7/e0/30642b9c2df516506d40b563b0cbd080c49c6b3f11a70b4c7a670f13a78b/setuptools-50.3.2.zip#sha256=ed0519d27a243843b05d82a5e9d01b0b083d9934eaa3d02779a23da18077bd3c
Best match: setuptools 50.3.2
Processing setuptools-50.3.2.zip
Writing /tmp/easy_install-9pghtk1g/setuptools-50.3.2/setup.cfg
Running setuptools-50.3.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-9pghtk1g/setuptools-50.3.2/egg-dist-tmp-n9q3ydk2
warning: no files found matching '*.py' under directory 'tests'
warning: no files found matching 'Makefile' under directory 'docs'
warning: no previously-included files found matching 'pyproject.toml'
warning: install_lib: 'build/lib' does not exist -- no Python modules to install

zip_safe flag not set; analyzing archive contents...
Removing /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages/UNKNOWN-0.0.0-py3.6.egg
Moving UNKNOWN-0.0.0-py3.6.egg to /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages
UNKNOWN 0.0.0 is already the active version in easy-install.pth
Installing easy_install script to /home/mtg/.pyenv/versions/3.6.0/bin
Installing easy_install-3.6 script to /home/mtg/.pyenv/versions/3.6.0/bin

Installed /home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages/UNKNOWN-0.0.0-py3.6.egg
Skipping dependencies for UNKNOWN 0.0.0
(base) mtg@mtg-ThinkPad-P53:~/bin/mavis$ python bootstrap.py
ez_setup.py is deprecated and when using it setuptools will be pinned to 33.1.1 since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools

The required version of setuptools (>=33.1.1) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U setuptools'.

(Currently using setuptools 28.8.0 (/home/mtg/.pyenv/versions/3.6.0/lib/python3.6/site-packages))

It seems there is a problem at the python bootstrap.py step.

creisle commented 3 years ago

oh! sorry about that! it looks like the automated publish didn't run. I will look into why that didn't trigger but in the mean time I've published it manually

https://pypi.org/project/mavis/2.2.8/

moldach commented 3 years ago
export MAVIS_ALIGNER='bwa mem' 
export MAVIS_ALIGNER_REFERENCE=/home/mtg/MAVIS/hg19.fa
$ export MAVIS_SCHEDULER=LOCAL
$ export MAVIS_CONCURRENCY_LIMIT=2
$ mavis schedule -o output_dir
$ mavis schedule -o output_dir --submit

But the jobs never submitted:

(mavis_venv) (base) mtg@mtg-ThinkPad-P53:~/MAVIS$ mavis schedule -o output_dir
                      MAVIS: 2.2.8
                      hostname: mtg-ThinkPad-P53
[2020-11-21 12:46:31] arguments
                        command = 'schedule'
                        log = None
                        log_level = 'INFO'
                        output = 'output_dir'
                        resubmit = False
                        submit = False
[2020-11-21 12:46:31] validate
                        MV_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-1 (6KoVdttMtWEbuQRyUVBpND) is UNKNOWN
                          missing log file: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/validate/batch-UtitsoCMXRxq4WxXsW773k-1/job-MV_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-1-6KoVdttMtWEbuQRyUVBpND.log
                        MV_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-2 (AoG7sTy3Kuv5x9JRZiBYTx) is UNKNOWN
                          missing log file: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/validate/batch-UtitsoCMXRxq4WxXsW773k-2/job-MV_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-2-AoG7sTy3Kuv5x9JRZiBYTx.log
                        MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-1 (isqWQZs9XMiTEwqgBVaX85) is UNKNOWN
                          missing log file: /home/mtg/MAVIS/output_dir/mock-A47933_diseased_transcriptome/validate/batch-UtitsoCMXRxq4WxXsW773k-1/job-MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-1-isqWQZs9XMiTEwqgBVaX85.log
                        MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-2 (bptyVThx58jAuCH4XwHycc) is UNKNOWN
                          missing log file: /home/mtg/MAVIS/output_dir/mock-A47933_diseased_transcriptome/validate/batch-UtitsoCMXRxq4WxXsW773k-2/job-MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-2-bptyVThx58jAuCH4XwHycc.log
                        MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-3 (5Hb7UuNYA6HeXaBC6sfDPz) is UNKNOWN
                          missing log file: /home/mtg/MAVIS/output_dir/mock-A47933_diseased_transcriptome/validate/batch-UtitsoCMXRxq4WxXsW773k-3/job-MV_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-3-5Hb7UuNYA6HeXaBC6sfDPz.log
[2020-11-21 12:46:31] annotate
                        MA_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-1 is NOT SUBMITTED
                        MA_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k-2 is NOT SUBMITTED
                        MA_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-1 is NOT SUBMITTED
                        MA_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-2 is NOT SUBMITTED
                        MA_mock-A47933_batch-UtitsoCMXRxq4WxXsW773k-3 is NOT SUBMITTED
[2020-11-21 12:46:31] pairing
                        MP_batch-UtitsoCMXRxq4WxXsW773k is NOT SUBMITTED
[2020-11-21 12:46:31] summary
                        MS_batch-UtitsoCMXRxq4WxXsW773k is NOT SUBMITTED
                      rewriting: output_dir/build.cfg

There were two log files in output_dir:

      1                       MAVIS: 2.2.8
      2                       hostname: mtg-ThinkPad-P53
      3 [2020-11-21 12:41:02] arguments
      4                         annotations = ['/home/mtg/MAVIS/tests/data/mock_annotations.json']
      5                         batch_id = 'batch-UtitsoCMXRxq4WxXsW773k'
      6                         cluster_initial_size_limit = 25
      7                         cluster_radius = 100
      8                         command = 'cluster'
      9                         disease_status = 'diseased'
     10                         inputs = ['output_dir/converted_inputs/mock_converted.tab']
     11                         library = 'mock-A36971'
     12                         limit_to_chr = [None]
     13                         log = '/home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/MC_mock-A36971_batch-UtitsoCMXRxq4WxXsW773k.log'
     14                         log_level = 'INFO'
     15                         masking = ['/home/mtg/MAVIS/tests/data/mock_masking.tab']
     16                         max_files = 200
     17                         max_proximity = 5000
     18                         min_clusters_per_file = 2
     19                         output = '/home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster'
     20                         protocol = 'genome'
     21                         split_only = False
     22                         strand_specific = False
     23                         uninformative_filter = True
     24 [2020-11-21 12:41:02] loading: ['/home/mtg/MAVIS/tests/data/mock_annotations.json']
     25 [2020-11-21 12:41:02] loading: ['/home/mtg/MAVIS/tests/data/mock_masking.tab']
     26                       loading: /home/mtg/MAVIS/output_dir/converted_inputs/mock_converted.tab
     27                       loaded 28 breakpoint pairs
     28                       filtering by library and chr name
     29                       filtering from 28 using overlaps with regions filter
     30                       filtered from 28 down to 28 (removed 0)
     31                       filtering from 28 breakpoint pairs using informative filter
     32                       filtered from 28 down to 4 (removed 24)
     33                       writing: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/filtered_pairs.tab
     34                       creating output directory: '/home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster'
     35                       computing clusters
     36                       computed 4 clusters
     37                       cluster input pairs distribution [(1, 4)]
     38                       cluster intervals lengths [(0, 8)]
     39                       writing: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/cluster_assignment.tab
     40                       writing: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/clusters.bed
     41                       writing: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/batch-UtitsoCMXRxq4WxXsW773k-1.tab
     42                       writing: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/batch-UtitsoCMXRxq4WxXsW773k-2.tab
     43                       complete: /home/mtg/MAVIS/output_dir/mock-A36971_diseased_genome/cluster/MAVIS-batch-UtitsoCMXRxq4WxXsW773k.COMPLETE
     44                       run time (hh/mm/ss): 0:00:00
     45                       run time (s): 0
~
calchoo commented 3 years ago

Similar to #234 can you try it again using blat as the aligner? The bwa mem is ignored when running the test pipeline config

moldach commented 3 years ago

Okay so the official BLAT installation instructions were crap but I managed to figure it out from this link (in-case anyone stumbles on this in the future).

Seems to work now for the mini-tutorial (test pipeline config).

I'm going to try it on the full-tutorial now (https://github.com/bcgsc/mavis/issues/228)

moldach commented 3 years ago

Didn't work out on my laptop (memory issue) but the issue with Bio.Alphabet has been dealt with so I'm closing this issue.