Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0), : key 51 (char '3') not in lookup table #35

Closed heleendelbeke closed 4 years ago

heleendelbeke commented 4 years ago

Hi all,

Even in these horrific times of COVID-19 pandemic we try to focus on our research.

The DADA2 pipeline Tutorial was very useful but now I am struggling big time on the phyloseq part. I just removed all Eukaryota en NA from my data set: ps_EPSO1_noEuks_noUnk <- subset_taxa(ps_EPSO1, Kingdom!="Eukaryota" & Phylum!=" ")

print(ps_EPSO1_noEuks_noUnk) phyloseq-class experiment-level object otu_table() OTU Table: [ 3455 taxa and 40 samples ] sample_data() Sample Data: [ 40 samples by 13 sample variables ] tax_table() Taxonomy Table: [ 3455 taxa by 6 taxonomic ranks ] refseq() DNAStringSet: [ 3455 reference sequences ] But I receive on my next step an error:

dna <- Biostrings::DNAStringSet(taxa_names(ps_EPSO1_noEuks_noUnk)) Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0), : key 51 (char '3') not in lookup table If I look at my ps_EPSO1_noEuks_noUnk, my pay_tree is NULL. I have no idea how come.

Some help would be much appreciated!

BW H.

hpages commented 4 years ago

Hi,

Biostrings::DNAStringSet() expects the input sequences to contain letters that belong to the DNA alphabet (DNA_ALPHABET) or it will fail:

> Biostrings::DNAStringSet(c("TTCGAT", "AGAxG"))
Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0),  : 
  key 120 (char 'x') not in lookup table

So the error message you're getting in your example is basically telling you that taxa_names(ps_EPSO1_noEuks_noUnk) cannot be turned into a DNAStringSet object because it contains the letter 3. In other words Biostrings::DNAStringSet() is behaving as expected so I don't see an issue with the Biostrings package itself here. The real question is: why do you end up with the letter 3 in sequences that are expected to only contain DNA letters?

dada2 and phyloseq are both Bioconductor packages. If you need help with these packages please ask on the Bioconductor support site: https://support.bioconductor.org/ Make sure to read our Posting Guide before you post your question. In particular make sure to tag your question appropriately and to provide a minimum reproducible example. Thanks!

The other H.

hpages commented 4 years ago

Hi,

Did you make progress on this? Were you able to ask your question on the Bioconductor support site? Did my earlier comment (above) help?

Should we consider this issue closed?

Thanks!

heleendelbeke commented 4 years ago

Hi, Thank you for guiding me. I was able to solve the problem. This issue can be closed. BW Heleen

hpages commented 4 years ago

Excellent!

Cheers, H.

elcega commented 3 years ago

@heleendelbeke Can you please post how did you solve this? I came across the same error but in my case was key 48 (char '0') not in lookup table

mtmorgan commented 3 years ago

Here's a simple reproduction of the error

> Biostrings::DNAStringSet("ACT0")
Error in .Call2("new_XString_from_CHARACTER", class(x0), string, start,  :
  key 48 (char '0') not in lookup table

Note that the DNA string contains the (number) 0, but 0 is not in the DNA alphabet. So whatever your workflow, somehow you are trying to represent as a DNA sequence some sequence that is not DNA.

hpages commented 3 years ago

@heleendelbeke If you're still having problem with this, we could try to help more but it's almost impossible to do without knowing a little bit more about the context of when/how this happens. Ideally you would need to show us some short code snippet that is self-contained so we can just copy-and-paste it in our R session to reproduce the problem. Thanks!

christina-hayes commented 2 years ago

Hi everyone,

I have a similar error: Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0), : key 112 (char 'p') not in lookup table

But I have checked all of my sequences and there is no "p" character that I can find. I am fairly new to this, so any suggestions would be greatly appreciated!

Thanks, C

hpages commented 2 years ago

As I said earlier to @heleendelbeke , it's almost impossible to help you without knowing a little bit more about the context of when/how this happens. If you want us to help, you need to show us some short code snippet that is self-contained so we can just copy-and-paste it in our R session to reproduce the problem. Thanks!

ankeetkumar commented 1 year ago

dna<-DNAString(x="ATGCTAGATC.ATGT") translate(dna) Error in .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : not a base at pos 11

I understand the error, but how to overcome this issue? I want to translate this string, maybe skip this "dot" and translate the DNA,in-frame.

Thanks.

hpages commented 1 year ago

Hi @ankeetkumar, note that you're posting a question about Biostrings::translate() in an issue that has nothing to do with Biostrings::translate() and that was closed about 1 year ago. Not the best way to get help.

Furthermore, if you need help with basic usage of Bioconductor packages, it's better to use our suport site at https://support.bioconductor.org. GitHub issues is mostly for bug reports. Before you ask on our support site, make sure to read the posting guide. Also please provide some context e.g. where is the . coming from and what is it supposed to represent. If all you want is a wildcard at position 11, then use an N instead of a dot. You will also need to call translate() with if.fuzzy.codon="solve". Make sure to read the man page (?translate) where all this is discussed. Thanks!

ankeetkumar commented 1 year ago

Understood. Thank you @hpages.