Open AlexanderHauser opened 7 years ago
Looks like DrugBank 5.0 uses a different schema for references. From https://www.drugbank.ca/releases/5-0-7/downloads/all-full-database, I see the following XML:
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
</article>
<article>
<pubmed-id>10912644</pubmed-id>
<citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
</article>
<article>
<pubmed-id>11055889</pubmed-id>
<citation>Eriksson BI: New therapeutic options in deep vein thrombosis prophylaxis. Semin Hematol. 2000 Jul;37(3 Suppl 5):7-9.</citation>
</article>
<article>
<pubmed-id>11467439</pubmed-id>
<citation>Fabrizio MC: Use of ecarin clotting time (ECT) with lepirudin therapy in heparin-induced thrombocytopenia and cardiopulmonary bypass. J Extra Corpor Technol. 2001 May;33(2):117-25.</citation>
</article>
<article>
<pubmed-id>11807012</pubmed-id>
<citation>Szaba FM, Smiley ST: Roles for thrombin and fibrin(ogen) in cytokine/chemokine production and macrophage adhesion in vivo. Blood. 2002 Feb 1;99(3):1053-9.</citation>
</article>
<article>
<pubmed-id>11752352</pubmed-id>
<citation>Chen X, Ji ZL, Chen YZ: TTD: Therapeutic Target Database. Nucleic Acids Res. 2002 Jan 1;30(1):412-5.</citation>
</article>
</articles>
<textbooks/>
<links/>
</references>
So you have to modify parse.ipynb
. Perhaps you can create an XPath query to find all pubmed-id
subelements of references. Perhaps something like (untested):
pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join(x.text for x in pubmed_ids)
Let us know whether this works. Also pull requests to upgrade this repo to DrugBank 5.0 would be of interest.
Thanks for your quick response!
Your suggested XPath query seems to work, only 3 entries were None is returned, which might be a database issue. I have no further upgrades to the repo for Drugbank 5.0 compatibility, so hence please go forward with this (minor) change.
In case it helps anyone else, the following changes (based on the suggestion above) fixed the issue for me:
pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join([x.text for x in pubmed_ids if x.text is not None])
doesn't seem to catch anything on the latest drugbank 5 release.
Any bugfix for this?