BioinformaticsLabAtMUN / Promotech

Machine-learning-based general bacterial promoter prediction tool.
GNU General Public License v3.0
42 stars 11 forks source link

ValueError: could not broadcast input array from shape (40) into shape (160) #5

Closed PengfanZhang closed 2 years ago

PengfanZhang commented 2 years ago

Hi,

I'm running Promotech on my bacterial genomes, but failed because of the following error:

 CONVERTING DATA
 41% (1955608 of 4684263) |############################                                         | Elapsed Time: 0:12:29 ETA:   0:17:30Traceback (most recent call last):
  File "/netscratch/dep_psl/grp_rgo/pzhang/tools/promotech/Promotech/promotech.py", line 83, in <module>
    data_type        = args.model,
  File "/netscratch/dep_psl/grp_rgo/pzhang/tools/promotech/Promotech/genome/process_genome.py", line 92, in parseGenome40NTSequences
    X = dataConverter( seqs=cutted_seqs, data_type=data_type, tokenizer_path=tokenizer_path, print_fn=print_fn, log_file=log_file )
  File "/netscratch/dep_psl/grp_rgo/pzhang/tools/promotech/Promotech/sequences/../core/utils.py", line 232, in dataConverter
    data_df = fastaToHotEncodingSequences( seqs )
  File "/netscratch/dep_psl/grp_rgo/pzhang/tools/promotech/Promotech/sequences/../core/utils.py", line 174, in fastaToHotEncodingSequences
    data = getHotFeatures(seqs)
  File "/netscratch/dep_psl/grp_rgo/pzhang/tools/promotech/Promotech/sequences/../core/utils.py", line 168, in getHotFeatures
    data[i, :] = sequenceToBinary(seq)
ValueError: could not broadcast input array from shape (40) into shape (160)

Hope you can help me figure out this problem.

BioinformaticsLabAtMUN commented 2 years ago

Hello,

Does your sequence contains any character other than A, C, G, T? That's a very common source of this problem.

PengfanZhang commented 2 years ago

Yes. I found that ambiguous bases is the cause of this problem.

Thanks!

BioinformaticsLabAtMUN commented 2 years ago

Glad the issue has been resolved.