bridgedb / BridgeDbR

Bioconductor R package for BridgeDb
https://doi.org/10.18129/B9.bioc.BridgeDbR
4 stars 3 forks source link

Issues filtering for datasource (in Webservice, in R) and in Cytoscape? #33

Open DeniseSl22 opened 1 year ago

DeniseSl22 commented 1 year ago

@tabbassidaloii and me have been checking the issues with the xref batch query.

Issue reported by @egonw:

https://webservice.bridgedb.org/Human/xrefs/S/O14494?dataSource=L

this seems to ignore the ?dataSource=L parameter.

Issue reproduced by @tabbassidaloii and @DeniseSl22 ; we believe the parameter is not ignored, but that the SystemCodes are present in the mapping files, but not correctly read in by the BridgeDb libraries.

@tabbassidaloii also tried out different mapping files in R (91, 104, 105, 107) through the BridgeDbR-package (v2.8.0 **rJava_1.0-6; according to GitHub using BridgeDb libraries: [BridgeDb 3.0.19 and Derby 10.15.2]) and all these version give the same issue, when defining the datasource to map to parameter:

map(mapper, "H", "VGF", "L")
Error in .jcall("org/bridgedb/DataSource", "Lorg/bridgedb/DataSource;",  : 
  java.lang.IllegalArgumentException: No DataSource known for the Bioregistry.io prefix L

The first known "bioregistry" addition is mentioned in the BridgeDb 3.0.14 release. @egonw reset the Webservice back to BridgeDb 3.0.13 (which seemed to solve another issue).

We don't know if this issue is related to the other issues we're seeing for the GeneProtein_107 release (not being able to search for HGNC symbol in PV, this does work in 104, not in 105). Our suggestions:

egonw commented 1 year ago

@DeniseSl22, @tabbassidaloii, for the "BridgeDb back to 104" step, please update the JSON files accordingly in the https://github.com/bridgedb/data repository

tabbassidaloii commented 1 year ago

BridgeDb back to 104

The PR is sent

tabbassidaloii commented 1 year ago

regarding using an older version of BridgeDb (point 3): I have checked the dependencies for creating gene/protein derby files, and I noticed I have not updated that. It is even an older version of BridgeDb (3.0.6). And it has been the same for all the releases (v103 to v107). So that would not cause the problem. What do you @egonw @DeniseSl22 think? I have opened different versions of Hs derby files (v103, 104, and 107) in squirrel and their structures seem to be similar. What else can be checked?

DeniseSl22 commented 1 year ago

@tabbassidaloii : could you maybe run the script for Hs 107 again, and make sure all the java libraries are version 3.0.13 (check in pom.xml?). I could create a local version of PV 3.3, with new BridgeDb libraries (also 3.0.13), and see if that resolves the issues. If not, we might be looking at this from the wrong perspective (and might need to check Ensembl?), or we would have to go back to an older version of BridgeDb, and than go from there to see what might be causing the issues. In the meantime, I can create a new metabolite mapping file (which uses BridgeDb 3.0.13), and see if I'm getting the same issue in PV regarding the lookup of names.

tabbassidaloii commented 1 year ago

@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.

DeniseSl22 commented 1 year ago

@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.

This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?

tabbassidaloii commented 1 year ago

This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?

Indeed. I will share them on slack.

tabbassidaloii commented 1 year ago

The issue of not being able to search the database in PV using gene names was because of a minor change we made a while ago to fix an error. But we did not oversee the problem it may cause.

While generating the database for Zm (v52), we got the error below:

Attribute external_gene_name NOT FOUND

To solve this, we changed line 157 in QueryBioMart.java from geneId.setAttribute("name", "external_gene_id"); to geneId.setAttribute("name", "ensembl_gene_id");

So a search was only possible using Ensembl gene id.

Now I have changed it to

if (config.getSpecies().equals("zmays_eg_gene")) {
   geneId.setAttribute("name", "external_gene_name");
else {
   geneId.setAttribute("name", "ensembl_gene_id");
}

So the database for species with gene name (external_gene_name) attribute could be searched using gene names.

egonw commented 1 year ago

Thank you for debugging the issue!

mkutmon commented 1 year ago

Thanks, @tabbassidaloii!

egonw commented 1 month ago

See also https://github.com/bridgedb/BridgeDbWebservice/issues/29