adriantich / DnoisE

Distance denoise by Entropy
GNU General Public License v3.0
12 stars 3 forks source link

DnoiSE does not complete: "KeyError: 'count'" #8

Closed SanniH closed 3 years ago

SanniH commented 3 years ago

Hi!

I tried to run the DnoiSE.py with my own fasta file of dereplicated, non-chimeric ESVs, but unfortunately I got an error after about 45min, and was hoping you might be able to help?

The error I get:

Traceback (most recent call last): File "DnoisE/src/DnoisE.py", line 467, in denoised_ratio_d[i]['sequence'].upper() + "\n") KeyError: 'count'

I run DnoiSE on a remote server through a shell script, and my fasta is formatted as requested following your paper and the repo, although there was no mention of whether all bases must be as capitals so I have a mix.

$ head UTILA_DSE.fasta

uniq1;size=62441; TTATTCTACATACCCTGCTAGTGCTTATTTATCAACTGATTTAATAATCTTTTCATTACATTTAGCCGGTGCTAGTTCTATATTGTCTTCAATAAATTTTATTATAACAGTTTTTATGTTGCCTATAAattcttctttttctttttttcaatatcctttatttatagtagctcaaattactgtttcttttttATTATTAATATCTTTACCTGTTTTAGCCGCTGCTATTACTATGTTACTTTTTGATCGTAATTTCAACACTTCttttttttCCAATTATTTGGGTGGTGATGCTCTTCTTTATCAACATTTATTT uniq2;size=24836; TTTGAGTAGTGTTCAAGCTCATTCAGGTCCTTCTGTGGATTTGGCTATTTTTAGCCTTCATTTGTCCGGGGCAGCATCTATTATGGGTTCGATTAATTTCATTACAACAATTATTAATATGCGACCGGGAGGAATGGGAATGCATCGTTTGCCGCTATTTGTATGGGCAGTTTTGCTAACCGCAATTCTATTGTTGCTTTCTCTTCCTGTTTTGGCTGGGGGTATTACTATGTTGTTGACTGACCGAAATTTTAACACTACCTTTTTTGATCCCGCTGGAGGAGGAGACCCTGTTCTTTATCAACACCTATTT ...

My file was generated using PEAR for PE merging, and VSEARCH for length filtering and dereplication, and the resulting fasta contains ~44K unique sequences, single line, with length ranging from 303-323 (COI Leray fragment). I attached the fasta I used here.

The command I used for running DnoiSE was this: $ python3 DnoisE/src/DnoisE.py -i UTILA_DSE.fasta -o Utila -c 20

Any help would be appreciated as I am very keen to see how this compares to my previously generated data using dada2!

UTILA_DSE.zip

adriantich commented 3 years ago

Hi!

Thanks for contacting me.

Since many names refer to size/count/reads depending on the input format, I designed DnoisE to run whatever the name is. However, now I see that it was not done for the fasta_ouput line. I'll fix it now and tell you when it's done! I'll also modify to accept both upper and lower case letters and combined.

Thank again!

Adrià

On 16/4/21 13:58, SanniH wrote:

Hi!

I tried to run the DnoiSE.py with my own fasta file of dereplicated, non-chimeric ESVs, but unfortunately I got an error after about 45min, and was hoping you might be able to help?

The error I get:

Traceback (most recent call last): File "DnoisE/src/DnoisE.py", line 467, in denoised_ratio_d[i]['sequence'].upper() + "\n") KeyError: 'count'

I run DnoiSE on a remote server through a shell script, and my fasta is formatted as requested following your paper and the repo, although there was no mention of whether all bases must be as capitals so I have a mix.

$ head UTILA_DSE.fasta

uniq1;size=62441;
TTATTCTACATACCCTGCTAGTGCTTATTTATCAACTGATTTAATAATCTTTTCATTACATTTAGCCGGTGCTAGTTCTATATTGTCTTCAATAAATTTTATTATAACAGTTTTTATGTTGCCTATAAattcttctttttctttttttcaatatcctttatttatagtagctcaaattactgtttcttttttATTATTAATATCTTTACCTGTTTTAGCCGCTGCTATTACTATGTTACTTTTTGATCGTAATTTCAACACTTCttttttttCCAATTATTTGGGTGGTGATGCTCTTCTTTATCAACATTTATTT
uniq2;size=24836;
TTTGAGTAGTGTTCAAGCTCATTCAGGTCCTTCTGTGGATTTGGCTATTTTTAGCCTTCATTTGTCCGGGGCAGCATCTATTATGGGTTCGATTAATTTCATTACAACAATTATTAATATGCGACCGGGAGGAATGGGAATGCATCGTTTGCCGCTATTTGTATGGGCAGTTTTGCTAACCGCAATTCTATTGTTGCTTTCTCTTCCTGTTTTGGCTGGGGGTATTACTATGTTGTTGACTGACCGAAATTTTAACACTACCTTTTTTGATCCCGCTGGAGGAGGAGACCCTGTTCTTTATCAACACCTATTT
...

My file was generated using PEAR for PE merging, and VSEARCH for length filtering and dereplication, and the resulting fasta contains ~44K unique sequences, single line, with length ranging from 303-323 (COI Leray fragment). I attached the fasta I used here.

The command I used for running DnoiSE was this: $ python3 DnoisE/src/DnoisE.py -i UTILA_DSE.fasta -o Utila -c 20

Any help would be appreciated as I am very keen to see how this compares to my previously generated data using dada2!

UTILA_DSE.zip https://github.com/adriantich/DnoisE/files/6324683/UTILA_DSE.zip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adriantich/DnoisE/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASBYEQHQOJB2CKTAADJDQGDTJAQ7JANCNFSM43BMWKDA.

adriantich commented 3 years ago

Hi SanniH, The problem is fixed. you can update using git pull. Tell me if worked well please! A.

SanniH commented 3 years ago

Hi Adria,

I'll put it to run now and report back if there's any more issues :) Thanks for the quick response!

Sanni