davek44 / Basset

Convolutional neural network analysis for predicting DNA sequence activity.
MIT License
257 stars 107 forks source link

Not python 3 friendly #15

Open lzamparo opened 8 years ago

lzamparo commented 8 years ago

Hi,

Love the package, I'm keen to try it out for myself. Just wanted to point out that it doesn't seem to play well with Python 3. Some errors are easily fixed by running 2to3 on the relevant .py files, but some are not. For example, try running install_data.py using Anaconda Python 3. I get the following:

[zamparol@gpu-1-14 Basset]$ python install_data.py -r
[edited for brevity]
Traceback (most recent call last):
  File "/cbio/cllab/nobackup/zamparol/Basset/src/seq_hdf5.py", line 130, in <module>
    main()
  File "/cbio/cllab/nobackup/zamparol/Basset/src/seq_hdf5.py", line 46, in main
    seqs, targets = dna_io.load_data_1hot(fasta_file, targets_file, extend_len=options.extend_length, mean_norm=False, whiten=False, permute=False, sort=False)
  File "/cbio/cllab/nobackup/zamparol/Basset/src/dna_io.py", line 293, in load_data_1hot
    seq_vecs = hash_sequences_1hot(fasta_file, extend_len)
  File "/cbio/cllab/nobackup/zamparol/Basset/src/dna_io.py", line 267, in hash_sequences_1hot
    seq_vecs[header] = dna_one_hot(seq, seq_len)
  File "/cbio/cllab/nobackup/zamparol/Basset/src/dna_io.py", line 137, in dna_one_hot
    seq = seq[seq_trim:seq_trim+seq_len]
TypeError: slice indices must be integers or None or have an __index__ method

The same script seems to succeed using Anaconda Python 2.7.1 (though I can't be sure, the seq_hdf5.py step takes a while to complete). I'll use that for my purposes, but maybe you should update the readme to explicitly say python 2 is required?

onceupon commented 7 years ago

i met the same problem using the docker image, with Python 2.7.6. But seems to be solved by editing the file Basset/src/seq_hdf5.py, line 81-89, adding int() to test_count, train_count and valid_count

    train_count = seqs.shape[0] - test_count - valid_count
    train_count = int(batch_round(train_count, options.batch_size))
    print >> sys.stderr, '%d training sequences ' % train_count

    test_count = int(batch_round(test_count, options.batch_size))
    print >> sys.stderr, '%d test sequences ' % test_count

    valid_count = int(batch_round(valid_count, options.batch_size))
    print >> sys.stderr, '%d validation sequences ' % valid_count
CesarArenas-Mena commented 5 years ago

I guess the error in line 92 of current seq_hdf5.py while running python 2.7.5 is related. Onceupon, could your request a pull?

[ca445@cbsugpu01 bassetfiles]$ python /usr/local/Basset-0.1.0/src/seq_hdf5.py -r -c -v 3000 -t 3000 learn_cd4.fa lt.txt learn_cd4.h5
85261 training sequences
3000 test sequences
3000 validation sequences
Traceback (most recent call last):
  File "/usr/local/Basset-0.1.0/src/seq_hdf5.py", line 130, in <module>
    main()
  File "/usr/local/Basset-0.1.0/src/seq_hdf5.py", line 92, in main
    train_seqs, train_targets = seqs[i:i+train_count,:], targets[i:i+train_count,:]
TypeError: slice indices must be integers or None or have an __index__ method