ParkinsonLab / Metatranscriptome-Workshop

Metatranscriptomics Tutorial
54 stars 22 forks source link

TypeError: unhashable type: 'SeqRecord' : rRNA_seqs.add(sequence) #7

Closed ulises1229 closed 3 years ago

ulises1229 commented 6 years ago

Hi there,

I am trying to run the complete pipeline using my own data but I am getting an error on STEP 6. Rereplication. The error is: TypeError: unhashable type: 'SeqRecord', in file 2_Infernal_Filter.py", line 32. I saw that there is a similar issue https://github.com/ParkinsonLab/Metatranscriptome-Workshop/issues/2 with the same error but I could not find a solution.

I am running Python 2.7.14 and Biopython is already installed in version 1.71

I printed the first SeqRecord that was saved into the list sequences and it seems correct to me. ID: NB501110:47:HJLFHAFXX:1:11101:23551:2507 Name: NB501110:47:HJLFHAFXX:1:11101:23551:2507 Description: NB501110:47:HJLFHAFXX:1:11101:23551:2507 Number of features: 0 Per letter annotation for: phred_quality Seq('GTGCTACAATGGACAGAACAAAGGGCAGCGAAACCGCGAGGTTAAGCCAATCCC...GCG', SingleLetterAlphabet())

Nonetheless, when the algorithm tried to insert a SeqRecord into the mRNA set I got the error mentioned above.

In the following link are the fastq and the .infernalout files. https://www.dropbox.com/sh/g2jz771e0pjpbaz/AADOjhRfMzSKU_WvnO2_cY8la?dl=0

I would apprecciate any help provided.

Best,

billytaj commented 6 years ago

Hi, like the other ticket, can you verify that the other steps worked? FYI: we've got a new pipeline in the works. This one's rather outdated, and only really meant as a self-contained tutorial, using our pre-computed files to showcase the typical workflow of metatranscriptome analysis.

ulises1229 commented 6 years ago

Hi Billy,

Thanks for your quick response, Yes, I verified that the previous steps worked at least I got no errors during the other steps.

Do you know where can I find the new pipeline that you mentioned?

Best,

ulises1229 commented 5 years ago

I solved the issue by using a list instead of a set in the python script.

Best,

pwang16 commented 5 years ago

Hi Billy,

Good to know you have got a new pipeline in the works. Could you please let me know when and where we can find it? Have you published it? Thanks very much! Best wishes, Peng

mghanbari commented 5 years ago

ulises1229 I faced the same problem, can you elaborate on how you solved the issue? Regards mahdi

I solved the issue by using a list instead of a set in the python script.

Best,

billytaj commented 4 years ago

sorry for the extremely long reply. I've patched it. I'll update the fix soon

ardagulay commented 3 years ago

I updated the set to list and add to append. It worked.

!/usr/bin/env python

import sys import os import os.path import shutil import subprocess from Bio import SeqIO from Bio.SeqRecord import SeqRecord

sequence_file = sys.argv[1] sequences = list(SeqIO.parse(sequence_file, "fastq")) Infernal_out = sys.argv[2] mRNA_file = sys.argv[3] rRNA_file = sys.argv[4]

Infernal_rRNA_IDs = set()

mRNA_seqs = list() rRNA_seqs = list()

with open(Infernal_out, "r") as infile_read: for line in infile_read: if not line.startswith("#") and len(line) > 10: Infernal_rRNA_IDs.add(line[:line.find(" ")].strip())

mRNA_seqs = list()

for sequence in sequences: if sequence.id in Infernal_rRNA_IDs: rRNA_seqs.append(sequence) else: mRNA_seqs.append(sequence)

with open(mRNA_file, "w") as out: SeqIO.write(list(mRNA_seqs), out, "fastq")

with open(rRNA_file, "w") as out: SeqIO.write(list(rRNA_seqs), out, "fastq")

print (str(len(rRNA_seqs)) + " reads were aligned to the rRNA database") print (str(len(mRNA_seqs)) + " reads were not aligned to the rRNA database")

billytaj commented 3 years ago

note: this solution only works for this particular dataset in the tutorial. if there's duplicate read IDs in your own data unaffiliated with the tutorial, this will mess things up.

Felix-Matheri commented 3 years ago

@billytaj any news on the new pipeline please?