epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

primer fasta/fastq format issue #38

Closed mertz1611 closed 1 year ago

mertz1611 commented 1 year ago

I freshly installed pychopper with conda a few days ago, but started getting an error with input that worked previously. I saw in the error message the readfq function and realized it's opening primers.fa and expecting fastq format. I worked around it by editing my primers.fa to fastq format.

Using kit: primers.fa
Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP"
Traceback (most recent call last):
  File "/home/rm957/anaconda3/envs/pychopper/bin/pychopper", line 10, in <module>
    sys.exit(main())
  File "/home/rm957/anaconda3/envs/pychopper/lib/python3.8/site-packages/pychopper/scripts/pychopper.py", line 394, in main
    all_primers = seu.get_primers(args.b)
  File "/home/rm957/anaconda3/envs/pychopper/lib/python3.8/site-packages/pychopper/seq_utils.py", line 128, in get_primers
    for primer in readfq(primers):
  File "/home/rm957/anaconda3/envs/pychopper/lib/python3.8/site-packages/pychopper/seq_utils.py", line 70, in readfq
    probs = [10 ** (q / -10) for q in fx.get_quality_array()]
TypeError: 'NoneType' object is not iterable
brunaeus commented 1 year ago

I had the same issue. Strangely enough only when using edlib backend. Apparently the function get_primers in the seq_utils script uses the function readfq function used to read the fastq files and perform quality filter computations. What happens is that it tries to extract quality scores (fx.get_quality_array()) from fasta primer sequences, and of course there is no quality score which returns 'NoneType' for that object. My workaround, probably not the most elegant but works, was to rewrite the get_primers function (line 125 in seq_utils) from:

def get_primers(primers):
    "Load primers from fasta file"
    all_primers = {}
    for primer in readfq(primers):
        all_primers[primer.Name] = primer.Seq
        all_primers['-' + primer.Name] = reverse_complement(primer.Seq)
    return all_primers

to:

def get_primers(primers):
    "Load primers from fasta file"
    all_primers = {}
    with FastxFile(primers) as fh: 
        for primer in fh: 
            all_primers[primer.name] = primer.sequence
            all_primers['-' + primer.name] = reverse_complement(primer.sequence)
        print(all_primers)
        return all_primers

I'm also printing the primer sequences which I found a nice confirmation of the primers being used.

nrhorner commented 1 year ago

Hi @mertz1611 and @brunaeus

I'm sorry that you are having problems with Pychopper. Thanks for raising this issue. I'l take a look at this ASAP.

nrhorner commented 1 year ago

Hi @mertz1611 @brunaeus

Please could you try v2.7.8. This release should have fixed your issues.

nrhorner commented 1 year ago

Closing due to lack of response