Closed katieskinner98 closed 4 years ago
Thanks, Katie. Amazing job!
How many inhibitors and inhibitor-kinase relationships are there now, for the homepage?
I ended up with 1,432 inhibitors and 54,762 inhibitor_kinase relationships, and I also made sure that we had all the inhibitors that were given to us on QM+ đ
Great! Do you think the 30,000 inhibitors we had before was due to the same inhibitors being repeated under multiple aliases, or is it worth someone trying identify and salvage inhibitors that only occurred in my original web scraping?
Or are they simply too ambiguous and "low quality data" to bother with, do you reckon?
Ummmm I'm not 100% sure really! I guess the issue is in BindingDB, they tend to use chEMBL IDs as the names of inhibitors, or other names that don't correspond to chEMBL, which is just so annoying! So I think it would take a lot of time for someone to go through and try and compare everything. I mean I am so sure I'm missing a lot of inhibitors because of the inconsistencies between all the sources, so if someone has time they could comb through and do some checking.
I found a website with just under 300 inhibitors on it, and I did some random checking to see if I was missing some inhibitors, and I am, so I'm going to go through and add those tomorrow, but I think that's all I'll have time to do because I need to finalise my part of the documentation and all the other bits!
So strange no one has created a consistent database for kinase inhibitors, it seems like something that is so simple and essential but I really struggled to find something that was easy to scrape đ
You've done an amazing job! Did you use the MRC website for the <300 inhibitors? I downloaded their stuff weeks ago but foolishly assumed when I found kidfammap with its 35,000 that I'd be sorted! I should have checked the data more thoroughly. When you've added inhibitors from the other source you found, do you want me to also add any kidfammap ones whose names can't be found in any of your alias columns? It might be a slow script again though...
Haha thanks! Ummm no I used this website: http://www.icoa.fr/pkidb/, I haven't come across the MRC website actually. Yeah sure if you want to do it and have the time! Having extra kinases definitely isn't a bad thing!
This is the one I was talking about, not that we're going to bother with it now http://www.kinase-screen.mrc.ac.uk/kinase-inhibitors
Woo-hoo! Check out our inhibitor searching now! http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com/home
Omg it looks amazing, well done!!! I just finished adding the extra inhibitors, the new database in on my fork, its the one with _v8 at the end! Also, for the homepage, we have 1,465 inhibitors, and 60,057 inhibitor-kinase relationships đ
Well done! Taking a look now.
I think I will stick my inhibitor data to the end of yours, if you don't mind creating the DB one last time later today! Just running a slow loop now, probably a few more to write and run before it's ready.
Yeah sure go for it! Let me know when its ready!
Just quickly, the files I used to populate the database are in the populating db folder called final_inhibitors_dataframe.csv and final_inhib_kin_dataframe.csv just in case you weren't using those! These are the ones I created after adding the other inhibitors from that website. I haven't uploaded them to the other folder yet but I will once my scripts are commented and in a state to upload!
Oh, darn, I wasn't using those! Thanks!
Haha it's okay, I just remembered that I didn't specify which one was the final file so it's my bad!
I think I officially give up on this now, the scripts take way too long âšī¸
I think you've got 60,057 inh-kin relationships and 1,465 inhibitors in the final dataframes. You'd said 54,762 and 1,432 before, was that based on the old tables @katieskinner98 ?
I think I officially give up on this now, the scripts take way too long âšī¸
Haha I don't blame you! One of my scripts took 12 hours to run, it was awful!
And yeah I added 33 extra inhibitors this morning, and yep those are the figures that I gave you in my comment from earlier!
Omg it looks amazing, well done!!! I just finished adding the extra inhibitors, the new database in on my fork, its the one with _v8 at the end! Also, for the homepage, we have 1,465 inhibitors, and 60,057 inhibitor-kinase relationships đ
Haha, thanks! General comment to everyone: please forgive me when I question you on something you've already told me or considered carefully, I'm old and weary đĩ
Hahaha no don't worry about it! I haven't slept in like 2 days so I get it đ
âšī¸ Oh no, hope you get some sleep soon!
Hey guys!
I have made an updated schema, these are the two tables that have changed:
class Inhibitors(Base): __tablename__ = 'inhibitors' BindingDB_ID = Column(String(30)) chEMBL_ID = Column(String(30)) Ki_nM = Column(String(20)) IC50_nM = Column(String(20)) Kd_nM = Column(String(20)) EC50_nM = Column(String(20)) Molecule_name = Column(String(100)) Molecule_type = Column(String(50)) Molecular_formula = Column(String(50)) Molecular_weight = Column(String(20)) Synonyms = Column(String(1000)) IN_ID = Column(String(20), primary_key=True) chEMBL_URL = Column(String(100))
class InhibKin(Base): __tablename__ = 'inhib_kin' UniProt_ID = Column(String(20), ForeignKey('human_kinases.UniProt_ID')) BindingDB_ID = Column(String(30)) chEMBL_ID = Column(String(30)) Molecule_name = Column(String(100), ForeignKey('inhibitors.Molecule_name')) IN_KI = Column(String(20), primary_key=True)
Everything under the class needs to be indented it just wasn't working here for some reason đ
And I also populated the database and uploaded it in the usual spot, and added '_v7' at the end, just so we have the original database still available if anything goes wrong!
Sorry for the delay, script took a bit longer to run than I thought! Let me know if the database is working and linked up correctly!
I wasn't sure if Molecule_name was the best to use as the ForeignKey, so if something else would be better then I can change it, just let me know!
Once it's working we can update the diagrams etc!