Proteomicslab57357 / UniprotR

Retrieving Information of Proteins from Uniprot
GNU General Public License v3.0
59 stars 18 forks source link

Errror when fetching ids #39

Closed ramirobarrantes closed 2 weeks ago

ramirobarrantes commented 4 months ago

I am using UniprotR to convert some IDs as so:

geneIds <- c("PRAMEF10", "DENND4B", "OR6Y1", "TEKT4", "IQSEC1", "OR5H15", "GNB4", "ISOC1", "SLC25A48", "LRFN2", "HERPUD2", "HECW1", "STEAP1", "MUC17", "VPS13B", "SVEP1", "AHNAK", "SLC6A12", "FGF23", "KRAS", "KMT2D", "HSPH1", "TMEM179", "AHNAK2", "PACS2", "HERC2", "RBL2", "TP53", "FOXO3B", "KCNJ12", "DNAH17", "APCDD1", "MUC16", "CYP2A13", "ELSPBP1", "ZNF470", "SEC14L3", "CFAP47", "PERM1", "ADAMTS8", "SMAD4")

uniprotTable <- ConvertID(geneIds,ID_from = "Gene_Name",ID_to = "UniProtKB-Swiss-Prot", taxId=9606)

But I get an error:

Error in read.table(text = content(r, encoding = "UTF-8"), sep = "t", : incomplete final line found by readTableHeader on 'text'

I see that the error is in gene ID 27 (RBL2):

uniprotTable <- ConvertID(geneIds[27:27],ID_from = "Gene_Name",ID_to = "UniProtKB-Swiss-Prot", taxId=9606) Error in read.table(text = content(r, encoding = "UTF-8"), sep = "\t", : incomplete final line found by readTableHeader on 'text'

That said, what can I do? If I give it a large group of sequences it will break everything if it finds an error in one. Is the solution to do this one by one? That will take very long in some cases!!

MohmedSoudy commented 4 months ago

Hi @ramirobarrantes, Thank you for using our package. After investigation, it seems that RBL2 is the source of the problem as you mentioned. In the current version, we have the only option to avoid such a situation is to run them in parallel using the following script

 library(UniprotR)
library(parallel)
geneIds <- c("PRAMEF10", "DENND4B", "OR6Y1", "TEKT4", "IQSEC1", 
             "OR5H15", "GNB4", "ISOC1", "SLC25A48", "LRFN2", 
             "HERPUD2", "HECW1", "STEAP1", "MUC17", "VPS13B", 
             "SVEP1", "AHNAK", "SLC6A12", "FGF23", "KRAS", "KMT2D", 
             "HSPH1", "TMEM179", "AHNAK2", "PACS2", "HERC2", "RBL2",
             "TP53", "FOXO3B", "KCNJ12", "DNAH17", "APCDD1", "MUC16", 
             "CYP2A13", "ELSPBP1", "ZNF470", "SEC14L3", "CFAP47", "PERM1", "ADAMTS8", "SMAD4")

convert_with_error_handling <- function(id) {
  result <- try(ConvertID(id, ID_from = "Gene_Name", ID_to = "UniProtKB-Swiss-Prot", taxId = 9606), silent = TRUE)
  if (inherits(result, "try-error")) {
    return(NULL)  # Return NULL if there is an error
  }
  return(result)
}
results_list <- mclapply(geneIds, convert_with_error_handling, mc.cores = 1)
# Combine the results into one data frame, filtering out NULL values
uniprotTable <- do.call(rbind, Filter(Negate(is.null), results_list))
ramirobarrantes commented 4 months ago

Thank you very much. The only issue I see with this is that, as it runs in parallel, one looses in speed considerably. But thank you very much !!

From: Mohmed Soudy @.> Reply-To: Proteomicslab57357/UniprotR @.> Date: Thursday, July 4, 2024 at 6:52 AM To: Proteomicslab57357/UniprotR @.> Cc: Ramiro @.>, Mention @.> Subject: Re: [Proteomicslab57357/UniprotR] Errror when fetching ids (Issue #39) Resent-From: @.>

[This message was sent from an address outside the Larner College of Medicine. Please exercise caution when clicking links or opening attachments from this source. ]

Hi @ramirobarranteshttps://github.com/ramirobarrantes, Thank you for using our package. After investigation, it seems that RBL2 is the source of the problem as you mentioned. In the current version, we have the only option to avoid such a situation is to run them in parallel using the following script

library(UniprotR)

library(parallel)

geneIds <- c("PRAMEF10", "DENND4B", "OR6Y1", "TEKT4", "IQSEC1",

        "OR5H15", "GNB4", "ISOC1", "SLC25A48", "LRFN2",

        "HERPUD2", "HECW1", "STEAP1", "MUC17", "VPS13B",

        "SVEP1", "AHNAK", "SLC6A12", "FGF23", "KRAS", "KMT2D",

        "HSPH1", "TMEM179", "AHNAK2", "PACS2", "HERC2", "RBL2",

        "TP53", "FOXO3B", "KCNJ12", "DNAH17", "APCDD1", "MUC16",

        "CYP2A13", "ELSPBP1", "ZNF470", "SEC14L3", "CFAP47", "PERM1", "ADAMTS8", "SMAD4")

convert_with_error_handling <- function(id) {

result <- try(ConvertID(id, ID_from = "Gene_Name", ID_to = "UniProtKB-Swiss-Prot", taxId = 9606), silent = TRUE)

if (inherits(result, "try-error")) {

return(NULL) # Return NULL if there is an error

}

return(result)

}

results_list <- mclapply(geneIds, convert_with_error_handling, mc.cores = 1)

Combine the results into one data frame, filtering out NULL values

uniprotTable <- do.call(rbind, Filter(Negate(is.null), results_list))

— Reply to this email directly, view it on GitHubhttps://github.com/Proteomicslab57357/UniprotR/issues/39#issuecomment-2208678871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADS2KRZPCL5MHWK4BECN2H3ZKUSLTAVCNFSM6AAAAABKJ4T25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYGY3TQOBXGE. You are receiving this because you were mentioned.Message ID: @.***>