gagneurlab / MMSplice_MTSplice

Tissue-specific variant effect predictions on splicing
MIT License
39 stars 21 forks source link

Error with running MMSplice - keras decode('utf-8') #46

Closed pj-sullivan closed 2 years ago

pj-sullivan commented 3 years ago

Description

I usually run MMSplice with the VEP docker (as I couldn't get MMSplice running locally the first time), but have been having issues with the VEP server lately and would rather just run MMSplice without VEP, ideally using docker.

What I Did

I installed MMSplice both locally and with docker and came across the same error message when running this code:

# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_save, predict_all_table
from mmsplice.utils import max_varEff

# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
csv = 'pred.csv'

# Specify model
model = MMSplice()

dl = SplicingVCFDataloader(gtf, fasta, vcf, encode=False, tissue_specific=False)

# Or predict and return as df
predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)

# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)

predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)
Traceback (most recent call last):
  File "run_local_mmsplice.py", line 13, in <module>
    model = MMSplice()
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 62, in __init__
    custom_objects=custom_objects)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

I since then removed the .decode('utf-8') and .decode('utf8') lines of code from that file, in case that was the sole issue, but now I have another error which probably stemmed from removing that section from the code.

Traceback (most recent call last):
  File "run_local_mmsplice.py", line 18, in <module>
    predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 299, in predict_all_table
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 255, in predict_on_dataloader
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 255, in concat
    sort=sort,
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 301, in __init__
    objs = list(objs)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 215, in _predict_on_dataloader
    batch, dataloader.optional_metadata)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 130, in _predict_batch
    batch['inputs']['seq'])
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 95, in predict_modular_scores_on_batch
    self.acceptor_intronM.predict(batch['acceptor_intron']),
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training.py", line 1149, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
    exception_prefix='input')
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_5 to have 3 dimensions, but got array with shape (512, 1)

Any advice would be appreciated!

s6juncheng commented 3 years ago

Hi @pj-sullivan, thanks for using MMSplice. Which keras version are you using? How did you install MMSplice, with pip install?

s6juncheng commented 3 years ago

It looks like to be an issue with the new h5py version, see https://stackoverflow.com/questions/53740577/does-any-one-got-attributeerror-str-object-has-no-attribute-decode-whi, and https://stackoverflow.com/questions/64767814/coremltools-error-while-converting-str-object-has-no-attribute-decode

pj-sullivan commented 3 years ago

Yes, installed using pip install, and keras version is 2.2.4.

Thank you! Used the recommended pip install 'h5py==2.10.0' --force-reinstall and the first issue is solved. But unfortunately, I am still getting the second error.

s6juncheng commented 3 years ago

Hi @pj-sullivan, does the second error also happen in docker? Sorry I was not precise enough, did you install with pip install mmsplice or from the repo? The error looks a bit confusing and I'm not able to reproduce it. What happens if you run this notebook? https://github.com/gagneurlab/MMSplice_MTSplice/blob/master/notebooks/example.ipynb

I would suggest starting a new conda env with python 3.6 and install MMSplice with pip install mmsplice there. You also need to install:

conda install cyvcf2 -y
conda install cython -y

We are using this docker setup for unit testing: https://github.com/gagneurlab/MMSplice_MTSplice/blob/master/.circleci/config.yml