bridgedb / datasources

Repository with the BridgeDb data source.
Creative Commons Zero v1.0 Universal
4 stars 8 forks source link

Which system codes for ChEMBL? #7

Open stain opened 9 years ago

stain commented 9 years ago

In commit ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt:

ChEMBL compound Cl  http://www.ebi.ac.uk/chembl/    https://www.ebi.ac.uk/chembl/compound/inspect/$id   CHEMBL308052    metabolite      1   urn:miriam:chembl.compound  ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttl and in IdentifiersOrgDataSource.txt

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound
    at org.bridgedb.DataSource.findOrRegister(DataSource.java:640)
    at org.bridgedb.DataSource.register(DataSource.java:620)
    at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131)
    at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121)
    at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113)
    at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92)
    ... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txt are not ideal:

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?

Christian-B commented 9 years ago

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the DataSource
  3. Make up a new one only if neither of the above apply. I intentional used longer names here to not clash with possible future BridgeDB codes

So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?

— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

Christian-B commented 9 years ago

While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:

http://sourceforge.net/p/identifiers-org/new-collection/new/

Anders

----- Original Message -----

From: "Christian Brenninkmeijer" christian.brenninkmeijer@manchester.ac.uk To: "bridgedb/BridgeDb" reply@reply.github.com Cc: "EU openPHACTS project members based at the University of Manchester" OPENPHACTS-MCR@listserv.manchester.ac.uk, "bridgedb-discuss" bridgedb-discuss@googlegroups.com Sent: Wednesday, September 9, 2015 5:20:05 AM Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the DataSource
  3. Make up a new one only if neither of the above apply. I intentional used longer names here to not clash with possible future BridgeDB codes

So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

  • chembl.compoundhttp://identifiers.org/chembl.compound/
  • chembl.targethttp://identifiers.org/chembl.target/

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?

— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

You received this message because you are subscribed to the Google Groups "bridgedb-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to bridgedb-discuss+unsubscribe@googlegroups.com. To post to this group, send email to bridgedb-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/bridgedb-discuss. For more options, visit https://groups.google.com/d/optout.

AlasdairGray commented 9 years ago

I also support the use of identifiers.orghttp://identifiers.org codes here.

Alasdair

On 9 September 2015 at 19:25:25, Christian Y. Brenninkmeijer (notifications@github.commailto:notifications@github.com) wrote:

While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:

http://sourceforge.net/p/identifiers-org/new-collection/new/

Anders

----- Original Message -----

From: "Christian Brenninkmeijer" christian.brenninkmeijer@manchester.ac.uk To: "bridgedb/BridgeDb" reply@reply.github.com Cc: "EU openPHACTS project members based at the University of Manchester" OPENPHACTS-MCR@listserv.manchester.ac.uk, "bridgedb-discuss" bridgedb-discuss@googlegroups.com Sent: Wednesday, September 9, 2015 5:20:05 AM Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the DataSource
  3. Make up a new one only if neither of the above apply. I intentional used longer names here to not clash with possible future BridgeDB codes

So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

  • chembl.compoundhttp://identifiers.org/chembl.compound/
  • chembl.targethttp://identifiers.org/chembl.target/

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?

— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

You received this message because you are subscribed to the Google Groups "bridgedb-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to bridgedb-discuss+unsubscribe@googlegroups.com. To post to this group, send email to bridgedb-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/bridgedb-discuss. For more options, visit https://groups.google.com/d/optout.

— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16#issuecomment-138998259.

Alasdair J G Gray http://www.alasdairjggray.co.ukhttp://www.alasdairjggray.co.uk/ ORCID: http://orcid.org/0000-0002-5711-4872 Twitter: @gray_alasdair Telephone: +44 131 451 3429tel://Telephone: +44%20131%20451%203429 Office: EM 1.39


We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply.

Heriot-Watt University is a Scottish charity registered under charity number SC000278.

stain commented 8 years ago

My proposed pull request bridgedb/BridgeDb#20 is raised as discussion point to settle this according to what you said:

and adding the two first of these to datasource.txt of org.bridgedb.bio

For the http://linkedchemistry.info/ identifiers I can't find have any direct equivalent in Chembl, so I've renamed the confusing "chemblTarget" and "chemblMolecule" etc to linkedchemistry.chembl.id, linkedchemistry.chembl.target and linkedchemistry.chembl.molecule.

stain commented 8 years ago

See also bridgedb/BridgeDb#21 - I really struggle to do any kind of change on this.

egonw commented 8 years ago

@stain let's Skype chat in the coming week?