chembl / chembl_webresource_client

Official Python client for accessing ChEMBL API
https://www.ebi.ac.uk/chembl/api/data/docs
Other
368 stars 95 forks source link

Extract ChEMBL version associated per compound #55

Open czodrowskilab opened 5 years ago

czodrowskilab commented 5 years ago

To whom it may concern,

is there a way to extract something like a publication/experimental date associated with a ChEMBL compound?

For benchmarking studies, we would like to use time-split validation. Therefore, it would be of tremendous help for us to get an association between ChEMBL version (or publication/experimental date) and a ChEMBL compound?

Best regards, Paul

dominiquesydow commented 4 years ago

Hi!

Thanks for the chembl_webresource_client - we are using it at @volkamerlab a lot!

My question is related to @czodrowskilab's question here (I think), so I'll add it here.

Is there a field for the date when

was added to ChEMBL?

I know of the document_year field, which I can fetch for bioactivities:

bioactivities = bioactivities_api.filter(
    target_chembl_id='CHEMBL941').only(
        'activity_id',  
        'document_year'
     )
)

Given the ChEMBL database scheme, I guess it works as follows: From doc_id (field) in activities (table) go to doc_id (field) in docs (table) and extract year (field). This probably also works similarly for compounds.

However, the document's year

Thus, back to my question: What is the best way to filter compound or bioactivity entries in ChEMBL by their deposition date?

Thank you for your time.

dominiquesydow commented 4 years ago

I am realizing: The deposition date probably equal the ChEMBL version in which the compound/bioactivity was added.

Is there a way to access the version? In the ChEMBL scheme, the version table seems not to be connected with any other table: https://www.ebi.ac.uk/chembl/db_schema

dominiquesydow commented 4 years ago

Update on this matter: The ChEMBL support team (https://www.ebi.ac.uk/support/) let me know that it is currently not possible to extract data from the web services (or the interface) for a previous ChEMBL release.

eloyfelix commented 4 years ago

Hi, sorry for not having replied this before. As Dominique says, this is currently not possible but we are working towards supporting it on future ChEMBL versions.

iwwwish commented 1 year ago

Dear ChEMBL team,

Thank you for chembl_webresource_client. Is there any update on whether the version could be directly/indirectly linked to an activity record?

Like @czodrowskilab, I am also interested in splitting the activity data for a target by chembl version in which it first appeared (to mimic time-split).

Thanks, Vishal

BZdrazil commented 1 year ago

We are working on implementing time stamps for deposited data sets. We are exploring how to deal with updates of documents now. Tentatively, we'll release this with CHEMBL 34.

BZdrazil commented 1 year ago

However, time stamps for data sets are already available ( and have always been) via the VERSION table which includes CREATION_DATE for every ChEMBL release. By querying for which release a document has been added, you'll get your time stamps. After we've solved the question about updated documents, we plan to make that information more easily accessible, likely via a new table.