chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
603 stars 243 forks source link

issue with reduce #123

Closed zlye closed 5 years ago

zlye commented 5 years ago

Hi - I am having getting an error with the "reduce function" (line 46):

gene_seq = reduce(operator.add, seq_exons)

TypeError: reduce() of empty sequence with no initial value

Do you know what the problem might be? i've checked that my file formats match the examples provided... I am guessing this is somewhat trivial, I am not all that familiar with bio-python and the GFF parser yet. Thanks for your help

chapmanb commented 5 years ago

Thanks for the report and sorry about the issues. From the error message, it seems like you might have some GFF files without exons which those small scripts weren't checking for. I pushed a fix for provide an empty sequence here instead of failing, which I hope will resolve the issue for your files.

zlye commented 5 years ago

Hi Brad, Thanks for your speedy response. I am still getting the same errors with the change: To be clear we are talking about this script: https://github.com/chapmanb/bcbb/blob/master/biopython/glimmergff_to_proteins.py right? here is the error: or record in sequences:

File "biopython_gff2fasta_update.py", line 42, in protein_recs

gene_seq = reduce(operator.add, seq_exons, "")

NameError: name 'reduce' is not defined

I am using GFF files generated by maker... so there are exons and CDS in the gff file? should this make a difference? when I filter to a CDS only gff file I get the same error Here is an example header of my gff:

OsatB_bas.ctg325 . contig 1 218791 . . . ID=OsatB_bas.ctg325;Name=OsatB_bas.ctg325

OsatB_bas.ctg325 maker gene 28985 37323 . + . ID=basmati38815;Name=basmati38815;Alias=maker-OsatB_bas.ctg325-snap-gene-0.44;

OsatB_bas.ctg325 maker mRNA 28985 37323 . + . ID=basmati38815-RA;Parent=basmati38815;Name=basmati38815-RA;Alias=maker-OsatB_bas.ctg325-snap-gene-0.44-mRNA-1;_AED=0.00;_QI=140|1|1|1|0.92|0.93|15|296|439;_eAED=0.00;

OsatB_bas.ctg325 maker exon 28985 29287 . + . ID=basmati38815-RA:exon:582;Parent=basmati38815-RA;

OsatB_bas.ctg325 maker exon 31512 31639 . + . ID=basmati38815-RA:exon:583;Parent=basmati38815-RA;

OsatB_bas.ctg325 maker exon 31720 31852 . + . ID=basmati38815-RA:exon:584;Parent=basmati38815-RA;

OsatB_bas.ctg325 maker exon 32146 32302 . + . ID=basmati38815-RA:exon:585;Parent=basmati38815-RA;

OsatB_bas.ctg325 maker exon 32382 32460 . + . ID=basmati38815-RA:exon:586;Parent=basmati38815-RA;

OsatB_bas.ctg325 maker exon 32687 32727 . + . ID=basmati38815-RA:exon:587;Parent=basmati38815-RA;

On Sun, Mar 10, 2019 at 5:10 AM Brad Chapman notifications@github.com wrote:

Thanks for the report and sorry about the issues. From the error message, it seems like you might have some GFF files without exons which those small scripts weren't checking for. I pushed a fix for provide an empty sequence here instead of failing, which I hope will resolve the issue for your files.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_chapmanb_bcbb_issues_123-23issuecomment-2D471270288&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=sPeIoyOHpgYJbNbfBxpQTkoABBFW-DUuosc_Xs3Jswo&s=za3NF76kQZioTry5EbN9mD7sayqpWntftD3-T1QVqgk&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AotMU015FG4v2EA-5FmyZ-5FcHhwM3hEo1Dyks5vVMwSgaJpZM4bm4Ns&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=sPeIoyOHpgYJbNbfBxpQTkoABBFW-DUuosc_Xs3Jswo&s=e21AIS4NvoUbkIpWfVtjI6biaDWpZD0LiV24ScUg66c&e= .

chapmanb commented 5 years ago

Thank you for the additional report and sorry about the continued problems. It looks like your re-run might have switched to use Python 3 rather than Python 2, where reduce needs an extra import. I've adjusted the scripts so reduce should hopefully work on both versions now. Please let us know if you run into any other issues.

zlye commented 5 years ago

Hi Brad,

File "biopython_gff2fasta_update.py", line 59, in

main(*sys.argv[1:])

File "biopython_gff2fasta_update.py", line 30, in main

SeqIO.write(protein_recs(glimmer_file, ref_recs), out_handle, "fasta")

File "/home/xxx/pyenv/py3.6.3/lib/python3.6/site-packages/Bio/SeqIO/init.py", line 528, in write

for record in sequences:

File "biopython_gff2fasta_update.py", line 46, in protein_recs

protein_seq = gene_seq.translate()

TypeError: translate() takes exactly one argument (0 given)

And this is the protein_recs function in the python script - i of course added "from functools import reduce

32 def protein_recs(glimmer_file, ref_recs):

33 """Generate protein records from GlimmerHMM gene predictions.

34 """

35 with open(glimmer_file) as in_handle:

36 for rec in glimmer_predictions(in_handle, ref_recs):

37 for feature in rec.features:

38 seq_exons = []

39 for cds in feature.sub_features:

40 seq_exons.append(rec.seq[

41 cds.location.nofuzzy_start:

42 cds.location.nofuzzy_end])

43 gene_seq = reduce(operator.add, seq_exons, "")

44 if feature.strand == -1:

45 gene_seq = gene_seq.reverse_complement()

46 protein_seq = gene_seq.translate()

47 yield SeqRecord(protein_seq, feature.qualifiers["ID"][0], "", "")

Is there anything else that might be the problem? I tried as well with the original script and I am still getting errors. Thanks again, I really appreciate your help!

-Zoe

On Tue, Mar 12, 2019 at 6:03 AM Brad Chapman notifications@github.com wrote:

Thank you for the additional report and sorry about the continued problems. It looks like your re-run might have switched to use Python 3 rather than Python 2, where reduce needs an extra import. I've adjusted the scripts so reduce should hopefully work on both versions now. Please let us know if you run into any other issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_chapmanb_bcbb_issues_123-23issuecomment-2D471934568&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=EVu4vDR7-PYoNFmhb-PppP_TnT2ZVtPWGiCivRqTfzk&s=TS9ttkH_1B4Uw6mQ1OvK_J52Vo5Uqvi8xcwwYKpexY8&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AotMU0xsNslxAMOlWBwU8m-5FtasuJ6sXxks5vV3togaJpZM4bm4Ns&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=EVu4vDR7-PYoNFmhb-PppP_TnT2ZVtPWGiCivRqTfzk&s=gAT82uPSdJb8grQ1W2yGKeJJHpLdzTmeKniilESVCwQ&e= .

-- Zoe Lye P.h.D. candidate 12 Waverly Place New York University New York, NY 10003

chapmanb commented 5 years ago

Sorry about the continued issues. I haven't used this code in quite a while so there may be some issues we'll have to work through. I pushed a fix which I hope will resolve your problem -- it looks like you were somehow getting a string instead of biopython Bio.Seq object so I made it explicit to try and avoid that. Hope this gets things working cleanly for you.