Open stain opened 9 years ago
The rule I always applied to datasource system codes was
So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.
As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones
Christian
From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)
In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:
ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound
Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?
See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84
This (luckily) causes the IdentifersOrgReaderTest test to fail with:
Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more
The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.
At Identifiers.org we find the names
(but nothing for molecules, assays or target component)
Cc is already used by CCDS.
After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
CamelCasing here mimics other entries like EnMm (Ensembl Mouse).
Views?
— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.
While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:
http://sourceforge.net/p/identifiers-org/new-collection/new/
Anders
----- Original Message -----
From: "Christian Brenninkmeijer" christian.brenninkmeijer@manchester.ac.uk To: "bridgedb/BridgeDb" reply@reply.github.com Cc: "EU openPHACTS project members based at the University of Manchester" OPENPHACTS-MCR@listserv.manchester.ac.uk, "bridgedb-discuss" bridgedb-discuss@googlegroups.com Sent: Wednesday, September 9, 2015 5:20:05 AM Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)
The rule I always applied to datasource system codes was
- use existing BridgeDB code if it already exists! (Even if now deprecated)
- use the identiers.org code if BridgeDB does not already have the DataSource
- Make up a new one only if neither of the above apply. I intentional used longer names here to not clash with possible future BridgeDB codes
So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.
As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones
Christian
From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)
In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:
ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound
Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?
See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84
This (luckily) causes the IdentifersOrgReaderTest test to fail with:
Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more
The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:
- ChEMBLCompound
- ChemblId
- ChemblMolecule
- chembl.target
- ChemblTarget (!)
- Chembl16TargetComponent
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.
At Identifiers.org we find the names
- chembl.compoundhttp://identifiers.org/chembl.compound/
- chembl.targethttp://identifiers.org/chembl.target/
(but nothing for molecules, assays or target component)
Cc is already used by CCDS.
After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
- ChC (ChEMBL compound)
- ChT (ChEMBL target)
- ChTC (ChEMBL Target Component) -- or ChP for "protein"?
CamelCasing here mimics other entries like EnMm (Ensembl Mouse).
Views?
— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.
You received this message because you are subscribed to the Google Groups "bridgedb-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to bridgedb-discuss+unsubscribe@googlegroups.com. To post to this group, send email to bridgedb-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/bridgedb-discuss. For more options, visit https://groups.google.com/d/optout.
I also support the use of identifiers.orghttp://identifiers.org codes here.
Alasdair
On 9 September 2015 at 19:25:25, Christian Y. Brenninkmeijer (notifications@github.commailto:notifications@github.com) wrote:
While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:
http://sourceforge.net/p/identifiers-org/new-collection/new/
Anders
----- Original Message -----
From: "Christian Brenninkmeijer" christian.brenninkmeijer@manchester.ac.uk To: "bridgedb/BridgeDb" reply@reply.github.com Cc: "EU openPHACTS project members based at the University of Manchester" OPENPHACTS-MCR@listserv.manchester.ac.uk, "bridgedb-discuss" bridgedb-discuss@googlegroups.com Sent: Wednesday, September 9, 2015 5:20:05 AM Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)
The rule I always applied to datasource system codes was
- use existing BridgeDB code if it already exists! (Even if now deprecated)
- use the identiers.org code if BridgeDB does not already have the DataSource
- Make up a new one only if neither of the above apply. I intentional used longer names here to not clash with possible future BridgeDB codes
So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.
As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones
Christian
From: Stian Soiland-Reyes [notifications@github.com] Sent: Wednesday, September 09, 2015 1:09 PM To: bridgedb/BridgeDb Subject: [BridgeDb] Which system codes for Chembl? (#16)
In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:
ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound
Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?
See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84
This (luckily) causes the IdentifersOrgReaderTest test to fail with:
Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound at org.bridgedb.DataSource.findOrRegister(DataSource.java:640) at org.bridgedb.DataSource.register(DataSource.java:620) at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131) at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121) at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113) at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92) ... 33 more
The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:
- ChEMBLCompound
- ChemblId
- ChemblMolecule
- chembl.target
- ChemblTarget (!)
- Chembl16TargetComponent
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.
At Identifiers.org we find the names
- chembl.compoundhttp://identifiers.org/chembl.compound/
- chembl.targethttp://identifiers.org/chembl.target/
(but nothing for molecules, assays or target component)
Cc is already used by CCDS.
After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
- ChC (ChEMBL compound)
- ChT (ChEMBL target)
- ChTC (ChEMBL Target Component) -- or ChP for "protein"?
CamelCasing here mimics other entries like EnMm (Ensembl Mouse).
Views?
— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.
You received this message because you are subscribed to the Google Groups "bridgedb-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to bridgedb-discuss+unsubscribe@googlegroups.com. To post to this group, send email to bridgedb-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/bridgedb-discuss. For more options, visit https://groups.google.com/d/optout.
— Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16#issuecomment-138998259.
Alasdair J G Gray http://www.alasdairjggray.co.ukhttp://www.alasdairjggray.co.uk/ ORCID: http://orcid.org/0000-0002-5711-4872 Twitter: @gray_alasdair Telephone: +44 131 451 3429tel://Telephone: +44%20131%20451%203429 Office: EM 1.39
We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply.
Heriot-Watt University is a Scottish charity registered under charity number SC000278.
My proposed pull request bridgedb/BridgeDb#20 is raised as discussion point to settle this according to what you said:
and adding the two first of these to datasource.txt of org.bridgedb.bio
For the http://linkedchemistry.info/ identifiers I can't find have any direct equivalent in Chembl, so I've renamed the confusing "chemblTarget" and "chemblMolecule" etc to linkedchemistry.chembl.id
, linkedchemistry.chembl.target
and linkedchemistry.chembl.molecule
.
See also bridgedb/BridgeDb#21 - I really struggle to do any kind of change on this.
@stain let's Skype chat in the coming week?
In commit ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt:
Using system code
Cl
here clashes with the equivalent entry in org.bridgedb.rdf, which usesChEMBLCompound
- what was the reason for going withCl
?See both IdentifiersOrgDataSource.ttl and in IdentifiersOrgDataSource.txt
This (luckily) causes the
IdentifersOrgReaderTest
test to fail with:The system codes used for ChEMBL within IdentifiersOrgDataSource.txt are not ideal:
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.
At Identifiers.org we find the names
(but nothing for molecules, assays or target component)
Cc
is already used by CCDS.After discussing this with @egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
ChC
(ChEMBL compound)ChT
(ChEMBL target)ChTC
(ChEMBL Target Component) -- orChP
for "protein"?CamelCasing here mimics other entries like
EnMm
(Ensembl Mouse).Views?