althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
130 stars 12 forks source link

zip argument #1 must support iteration error + other question #17

Open willhuynh11 opened 2 years ago

willhuynh11 commented 2 years ago

Hey Martin,

Thanks for making this tool, I'm finding it very useful for my current project.

I have a profile hmm database obtained from CONJScan that I want to use to scan through a fasta file containing multiple sequences. I am running into some issues that I can't seem to figure out a work around.

To preface my issue, let me explain what I am trying to do: Using the CONJScan database and python, I am iterating over the profile hmms in a for-loop. Each loop, I am using a profile hmm to scan through a fasta file containing multiple sequences. Then at the end of each loop, I output a graphic via dna_features_viewer with a unique name containing a visualization of my alignments.

There are two problems I am encountering:

  1. Occasionally, I will receive an error saying that zip argument #1 must be iterable, this is in reference to for ax, hit in zip(axes, hits):... where argument #1 in zip(axes, hits) is not iterable. I am not sure why this is because aside from the ad-hoc loop I created to go through each profile hmm in my database, everything was done mimicking the example provided on the readthedocs.io page.

  2. At the end of the process, I will have multiple hits from different hmm profiles on the same fasta sequence. However, I would like to visualize them together, rather then separately. I am unsure if I am using the tool incorrectly or if this is unsupported currently.

    Copied below is my code, excuse me for the messiness, I am still testing things out.


import pyhmmer
import os
from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt

directory = 'profiles'
#iterate over profiles in folder
#this is to iterate over a folder containing many profile Hmm (CONJScan database)
for hmmprofile in os.listdir(directory):
    f = os.path.join(directory, hmmprofile)
    if os.path.isfile(f):
        try:
            with pyhmmer.plan7.HMMFile(f) as hmm_file:
                hmm = next(hmm_file) 
            with pyhmmer.easel.SequenceFile("test.fasta", digital=True) as seq_file: #test.fasta contains many sequences in amino acid format
                sequences = list(seq_file)
            pipeline = pyhmmer.plan7.Pipeline(hmm.alphabet)
            hits = pipeline.search_hmm(hmm, sequences)
            ali = hits[0].domains[0].alignment
            hmm_name = (ali.hmm_name.decode()) #storing the name of the hmm profile in the event that a search succeeds
            # create an index so we can retrieve a Sequence from its name
            seq_index = { seq.name:seq for seq in sequences }

            fig, axes = plt.subplots(nrows=len(hits), figsize=(30, 30), sharex=True)
            try:
                for ax, hit in zip(axes, hits):
                    # add one feature per domain
                    features = [
                        GraphicFeature(start=d.alignment.target_from-1, end=d.alignment.target_to, color='#00FF00', label=hmm_name) #using the hmm_name to create labels for the graphic feature
                        for d in hit.domains
                    ]
                    length = len(seq_index[hit.name])
                    desc = seq_index[hit.name].description.decode()

                    # render the feature records
                    record = GraphicRecord(sequence_length=length, features=features)
                    record.plot(ax=ax)
                    ax.set_title(desc)
                    try:
                        ax.figure.tight_layout()
                        ax.figure.savefig(desc + hmm_name + ".png") #using both the descriptor + hmm_name to create a unique result and saving the graphic as a png
                    except Exception as e:
                        # print(e)
                        continue
            except Exception as e:
                # print(e)
                continue
        except Exception as e:
            # print(e)
            continue

Any advise you can provide would help immensely. Thank you.

althonos commented 2 years ago

Hi @willhuynh11 !

This error you're getting, zip argument #1 must support iteration, is quite transparent: it means that the first argument to zip is not iterable; the first argument being axes. I cannot test immediately but I suppose axes may be None in the event where hits is empty; in modern versions of matplotlib passing a zero nrows to subplots raises an error but it could be you're using a version that just returns None there.