Arcadia-Science / sourmashconsumr

Working with the outputs of sourmash in R
https://arcadia-science.github.io/sourmashconsumr/
Other
21 stars 3 forks source link

`n_unique_kmers` doesn't exist #71

Open bluegenes opened 1 year ago

bluegenes commented 1 year ago

Had an error shared with me (🎉):

Error in `dplyr::select()`:
! Can't subset columns that don't exist
x Column `n_unique_kmers` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.

I see that the n_unique_kmers column is added during read_taxonomy_annotate, so the error is likely caused by using read_csv rather than read_taxonomy_annotate to read the file.

Would it be worth changing this internal column to n_unique_weighted_found to avoid this error for sourmash v4.5+, since we have the column now? We figured this name more clearly described the column info, but I'm not sure we discussed outside of the sourmash PR that added it.

Or if you want to force folks to use read_taxonomy_annotate (I see you do a couple other things in there) is there a way to catch the error + suggest the solution?

thanks for the awesome software!

taylorreiter commented 1 year ago

oh nice, thank you for reporting! I didn't know n_unique_weighted_found was added in v4.5+! let me noodle on this for a couple days and then I'll implement a fix. I will def switch to that naming scheme and only calculate that column if it isn't already in the output file...need to think if there are other things I can do to "catch" this. thank you again!

bluegenes commented 1 year ago

Here's where we calculate n_unique_weighted_found, in case it's helpful:

https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/search.py#L496-L510