Closed anuragpassi closed 8 years ago
Okay so you're running into trouble with pubchempy.get_compounds
in the pubchem-map.ipynb
notebook.
Can you check whether any pubchempy queries work on your setup? For example, does the following command succeed?
import pubchempy
inchi = "InChI=1S/C6H8O4/c1-9-5(7)3-4-6(8)10-2/h3-4H,1-2H3/b4-3+"
pubchempy.get_compounds(inchi, namespace='inchi')
yes they are working however even 1000 compounds give a timeout error
Hmm, in pubchem-map.ipynb
, I only request one compound at a time. Can you split your query into many smaller queries?
See the pubchempy docs about avoiding a timeout error.
Yes thats what i am trying
Sent from my iPhone
On May 18, 2016, at 10:27 PM, Daniel Himmelstein notifications@github.com wrote:
Hmm, in pubchem-map.ipynb, I only request one compound at a time. Can you split your query into many smaller queries?
See the pubchempy docs about avoiding a timeout error.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub
Hi,
Well I ran the drugbank mapping and got the data. However, when I tried to map the updated drugbank with the SIDER stich ids i only got some 340 drugs. Somehow the drugbank,pubchem and STITCH ids are not mapping and I am missing a lot of entries.
What can I do in this case.
Please advise.
Regards, Anurag
On Wed, May 18, 2016 at 10:41 PM, Anurag Passi <anuragpassibioinfo@gmail.com
wrote:
Yes thats what i am trying
Sent from my iPhone
On May 18, 2016, at 10:27 PM, Daniel Himmelstein notifications@github.com wrote:
Hmm, in pubchem-map.ipynb, I only request one compound at a time. Can you split your query into many smaller queries?
See the pubchempy docs about avoiding a timeout error http://pubchempy.readthedocs.io/en/v1.0.3/guide/advanced.html#avoiding-timeouterror .
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/dhimmel/drugbank/issues/1#issuecomment-220211985
Anurag Passi Sr. Research Fellow OSDD, CSIR 00-91-9899767938 skype: anurag.passi
when I tried to map the updated drugbank with the SIDER stich ids i only got some 340 drugs
I recover more than 340 compounds when mapping to SIDER. Check out how I map to the STITCH IDs to DrugBank in dhimmel/SIDER4
.
So is the drugbank to pubchem mapping is recent(pubchem.tsv)???
Sent from my iPhone
On May 20, 2016, at 12:05 PM, Daniel Himmelstein notifications@github.com wrote:
when I tried to map the updated drugbank with the SIDER stich ids i only got some 340 drugs
I recover more than 340 compounds when mapping to SIDER. Check out how I map to the STITCH IDs to DrugBank in dhimmel/SIDER4.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub
So is the drugbank to pubchem mapping is recent(pubchem.tsv)???
@anuragpassi I don't understand what you're asking. Try to be more clear and describe exactly what you mean.
I'm guessing you're asking whether the pubchem.tsv
file is recent. I got confused because there was no space between "recent" and "(pubchem.tsv)". The commit date for dhimmel/drugbank@3e87872db5fca5ac427ce27464ab945c0ceb4ec6 is Apr 13, 2015. Note that we used the UniChem connectivity search for the DrugBank mapping in dhimmel/SIDER4
.
Oh. I thought that Drugbank was first mapped to get PubChem IDs and then the PubChem IDs were mapped with STITCH IDs to get the DrugBank-SIDEEFFECT relation.
On Fri, May 20, 2016 at 6:29 PM, Daniel Himmelstein < notifications@github.com> wrote:
I'm guessing you're asking whether the pubchem.tsv https://github.com/dhimmel/drugbank/blob/3e87872db5fca5ac427ce27464ab945c0ceb4ec6/data/mapping/pubchem.tsv file is recent. I got confused because there was no space between "recent" and "(pubchem.tsv)". The commit date for dhimmel/drugbank@3e87872 https://github.com/dhimmel/drugbank/commit/3e87872db5fca5ac427ce27464ab945c0ceb4ec6 is Apr 13, 2015. Note that we used the UniChem connectivity search https://thinklab.com/discussion/unifying-drug-vocabularies/40#5 for the DrugBank mapping in dhimmel/SIDER4.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/dhimmel/drugbank/issues/1#issuecomment-220733482
Anurag Passi Sr. Research Fellow OSDD, CSIR 00-91-9899767938 skype: anurag.passi
Dear Daniel, I have tried your code for mapping DrugBank ID to STITCH ID and SIDE EFFECTS. However, when I try to use the updated DrugBank data form the website, I do not get much data. I was wondering if you could run your program on latest drugbank data so that I can match your output with mine.
Please advise.
Regards, Anurag
On Fri, May 20, 2016 at 6:58 PM, Anurag Passi anuragpassibioinfo@gmail.com wrote:
Oh. I thought that Drugbank was first mapped to get PubChem IDs and then the PubChem IDs were mapped with STITCH IDs to get the DrugBank-SIDEEFFECT relation.
On Fri, May 20, 2016 at 6:29 PM, Daniel Himmelstein < notifications@github.com> wrote:
I'm guessing you're asking whether the pubchem.tsv https://github.com/dhimmel/drugbank/blob/3e87872db5fca5ac427ce27464ab945c0ceb4ec6/data/mapping/pubchem.tsv file is recent. I got confused because there was no space between "recent" and "(pubchem.tsv)". The commit date for dhimmel/drugbank@3e87872 https://github.com/dhimmel/drugbank/commit/3e87872db5fca5ac427ce27464ab945c0ceb4ec6 is Apr 13, 2015. Note that we used the UniChem connectivity search https://thinklab.com/discussion/unifying-drug-vocabularies/40#5 for the DrugBank mapping in dhimmel/SIDER4.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/dhimmel/drugbank/issues/1#issuecomment-220733482
Anurag Passi Sr. Research Fellow OSDD, CSIR 00-91-9899767938 skype: anurag.passi
Anurag Passi Sr. Research Fellow OSDD, CSIR 00-91-9899767938 skype: anurag.passi
I was wondering if you could run your program on latest drugbank data so that I can match your output with mine.
Sorry I won't have time for this in the near future. I may update my mapping in the future. The conversion from STITCH ID to pubchem_ids is trivial:
def stitch_flat_to_pubchem(cid):
assert cid.startswith('CID')
return int(cid[3:]) - 1e8
def stitch_stereo_to_pubchem(cid):
assert cid.startswith('CID')
return int(cid[3:])
To go from pubchem to DrugBank, you could rerun pubchem-map.ipynb
(which maybe you have done) which maps by inchi. You can see the results of when I ran it at data/pubchem-mapping.tsv
.
You could also use the mapping in data/mapping/pubchem.tsv
which is generated using UniChem's connectivity search. This mapping will be more fuzzy than the first method (small chemical differences are ignored).
If your having lots of trouble with redoing the mapping, I'd suggest proceeding with either of the existing mappings.
I actually did use the existing mappings too but many drugs are missing. Do not know why. I will give the mappings another try.
Thank you
Sent from my iPhone
On May 27, 2016, at 5:16 PM, Daniel Himmelstein notifications@github.com wrote:
I was wondering if you could run your program on latest drugbank data so that I can match your output with mine.
Sorry I won't have time for this in the near future. I may update my mapping in the future. The conversion from STITCH ID to pubchem_ids is trivial:
def stitch_flat_to_pubchem(cid): assert cid.startswith('CID') return int(cid[3:]) - 1e8
def stitch_stereo_to_pubchem(cid): assert cid.startswith('CID') return int(cid[3:]) To go from pubchem to DrugBank, you could rerun pubchem-map.ipynb (which maybe you have done) which maps by inchi. You can see the results of when I ran it at data/pubchem-mapping.tsv.
You could also use the mapping in data/mapping/pubchem.tsv which is generated using UniChem's connectivity search. This mapping will be more fuzzy than the first method (small chemical differences are ignored).
If your having lots of trouble with redoing the mapping, I'd suggest proceeding with either of the existing mappings.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Also can you guide me as to which script to use to parse drugbank data to get inchis and map to pubchem.
Sent from my iPhone
On May 27, 2016, at 5:16 PM, Daniel Himmelstein notifications@github.com wrote:
I was wondering if you could run your program on latest drugbank data so that I can match your output with mine.
Sorry I won't have time for this in the near future. I may update my mapping in the future. The conversion from STITCH ID to pubchem_ids is trivial:
def stitch_flat_to_pubchem(cid): assert cid.startswith('CID') return int(cid[3:]) - 1e8
def stitch_stereo_to_pubchem(cid): assert cid.startswith('CID') return int(cid[3:]) To go from pubchem to DrugBank, you could rerun pubchem-map.ipynb (which maybe you have done) which maps by inchi. You can see the results of when I ran it at data/pubchem-mapping.tsv.
You could also use the mapping in data/mapping/pubchem.tsv which is generated using UniChem's connectivity search. This mapping will be more fuzzy than the first method (small chemical differences are ignored).
If your having lots of trouble with redoing the mapping, I'd suggest proceeding with either of the existing mappings.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
can you guide me as to which script to use to parse drugbank data
This mapping is accomplished by running two Python notebooks in the following order.
parse.ipynb
converts the XML download to TSV with an inchi
column.pubchem-map.ipynb
maps DrugBank to PubChem using inchi.Thank you. I'll try.
Sent from my iPhone
On May 27, 2016, at 6:14 PM, Daniel Himmelstein notifications@github.com wrote:
can you guide me as to which script to use to parse drugbank data
This mapping is accomplished by running two Python notebooks in the following order.
parse.ipynb converts the XML download to TSV with an inchi column. pubchem-map.ipynb maps DrugBank to PubChem using inchi. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Dear Daniel,
I have created these two files : Drugbank.tsv and pubchem-mapping.tsv. I am now running the SIDER4.0 code but apparently i get an error:
MergeError: No common columns to perform merge on
I believe that the pubchem_id in drugbank.tsv are not mapping to pubchem-mapping.tsv. What can be the problem here. I am trying to run this code.
columns = [ 'stitch_id_flat', 'stitch_id_sterio', 'umls_cui_from_label', 'meddra_type', 'umls_cui_from_meddra', 'side_effect_name',]se_df = pandas.read_table('download/meddra_all_se.tsv.gz', names=columns)se_df['pubchem_id'] = se_df.stitch_id_sterio.map(stitch_stereo_to_pubchem)se_df = drugbank_map_df.merge(se_df) ### THIS IS WHERE I AM GETTING ERROR se_df.head(2)
I am attaching the two input files with this email.
Please advise.
Regards,
Anurag
On Fri, May 27, 2016 at 6:22 PM, Anurag Passi anuragpassibioinfo@gmail.com wrote:
Thank you. I'll try.
Sent from my iPhone
On May 27, 2016, at 6:14 PM, Daniel Himmelstein notifications@github.com wrote:
can you guide me as to which script to use to parse drugbank data
This mapping is accomplished by running two Python notebooks in the following order.
- parse.ipynb https://github.com/dhimmel/drugbank/blob/55587651ee9417e4621707dac559d84c984cf5fa/parse.ipynb converts the XML download to TSV with an inchi column.
- pubchem-map.ipynb https://github.com/dhimmel/drugbank/blob/55587651ee9417e4621707dac559d84c984cf5fa/pubchem-map.ipynb maps DrugBank to PubChem using inchi.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhimmel/drugbank/issues/1#issuecomment-222264603, or mute the thread https://github.com/notifications/unsubscribe/APNV7dnW-B31xDWOB0w7aOS9Sd_My867ks5qF2yqgaJpZM4Ih0S1 .
Anurag Passi Sr. Research Fellow OSDD, CSIR 00-91-9899767938 skype: anurag.passi
@anuragpassi the attached files don't show up on the GitHub issue. I recommend replying via the GitHub issue interface, so you can see exactly how your message will get displayed.
Not sure why you are getting the error. I recommend viewing the head of each dataframe and making sure they have common columns to merge on.
Hi, I am trying to reproduce the result from your pubchem-parse.py and somehow I get a timeout error. How can I resolve it?