arendsee / phylostratr

An R framework for phylostratigraphy
GNU General Public License v3.0
33 stars 7 forks source link

Error: The focal species is not present in UniProt #29

Open CWYuan08 opened 1 year ago

CWYuan08 commented 1 year ago

Hi I am trying to repeat some analysis that I have managed to run in the past, but now when I am running it on our hpc, it is no longer working. I got the error below no matter what focal_id I use:

so from the main page, when I try: focal_taxid<-"3702" strata<-uniprot_strata(focal_taxid,from=2) %>% use_recommended_prokaryotes %>% add_taxa(c('4932','9606')) %>% uniprot_fill_strata

The focal species is not present in UniProt. You may add it after retrieving uniprot sequences (i.e. with 'uniprot_fill_strata') with a command such as: strata_obj@data$faa[[focal_taxid]] <- '/path/to/your/focal-species.faa' Error in integer(max(oldnodes)) : vector size cannot be infinite In addition: Warning message: In max(oldnodes) : no non-missing arguments to max; returning -Inf

I have installed phylostratr from github and blastp.

Thank you very much!

Best, CW

wolfylair commented 1 year ago

Same issue here. I suspect Uniprot is changing its API again. Is there any way to apply local database so we may get around of the issue?

arendsee commented 1 year ago

Oh bother. @wolfylair You may be right; they broke my API a few years ago. I guess you could hack a solution.

Here is the offending code:

#' Download sequence data for each species in a UniProt-based strata
#'
#' @param strata Strata object where all species are represented in UniProt.
#' @param ... Additional arguments for \code{uniprot_retrieve_proteome}
#' @return Strata object where 'data' slot is filled with protein FASTA files
#' @export
uniprot_fill_strata <- function(strata, ...){
  species <- strata@tree$tip.label
  strata@data$faa <- lapply(strata@tree$tip.label, uniprot_retrieve_proteome, ...)
  names(strata@data$faa) <- strata@tree$tip.label
  strata
}

I guess you could bypass Uniprot and load your own sequences into R here. You'd have to rewrite the uniprot_fill_strata with something that looks up the tip.labels (NCBI taxonomy codes) in the database.

wolfylair commented 1 year ago

Thanks! Is there any chance to resume the normal use of PhylostratR in this situation? It'll be great if there is another port for the package to link with Uniprot.

I am currently re-running the previous focal species using different gene annotations. Therefore I just changed the focal species sequence directly in the strata object without modifying the trees. Will it affect the accuracy of the output?

Thanks again for your hard work and immediate response!

Best