celiaccb / Software-Development-Group-Project-2020

JACKY, a friendly tool for human kinase, phosphosites and kinase inhibitors information
http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com
0 stars 0 forks source link

Inhibitors info for the website! #91

Closed katieskinner98 closed 4 years ago

katieskinner98 commented 4 years ago

Hey guys!

I have made an updated schema, these are the two tables that have changed:

class Inhibitors(Base): __tablename__ = 'inhibitors' BindingDB_ID = Column(String(30)) chEMBL_ID = Column(String(30)) Ki_nM = Column(String(20)) IC50_nM = Column(String(20)) Kd_nM = Column(String(20)) EC50_nM = Column(String(20)) Molecule_name = Column(String(100)) Molecule_type = Column(String(50)) Molecular_formula = Column(String(50)) Molecular_weight = Column(String(20)) Synonyms = Column(String(1000)) IN_ID = Column(String(20), primary_key=True) chEMBL_URL = Column(String(100))

class InhibKin(Base): __tablename__ = 'inhib_kin' UniProt_ID = Column(String(20), ForeignKey('human_kinases.UniProt_ID')) BindingDB_ID = Column(String(30)) chEMBL_ID = Column(String(30)) Molecule_name = Column(String(100), ForeignKey('inhibitors.Molecule_name')) IN_KI = Column(String(20), primary_key=True)

Everything under the class needs to be indented it just wasn't working here for some reason 😂

And I also populated the database and uploaded it in the usual spot, and added '_v7' at the end, just so we have the original database still available if anything goes wrong!

Sorry for the delay, script took a bit longer to run than I thought! Let me know if the database is working and linked up correctly!

I wasn't sure if Molecule_name was the best to use as the ForeignKey, so if something else would be better then I can change it, just let me know!

Once it's working we can update the diagrams etc!

AnnaDearman commented 4 years ago

Thanks, Katie. Amazing job!

AnnaDearman commented 4 years ago

How many inhibitors and inhibitor-kinase relationships are there now, for the homepage?

katieskinner98 commented 4 years ago

I ended up with 1,432 inhibitors and 54,762 inhibitor_kinase relationships, and I also made sure that we had all the inhibitors that were given to us on QM+ 😊

AnnaDearman commented 4 years ago

Great! Do you think the 30,000 inhibitors we had before was due to the same inhibitors being repeated under multiple aliases, or is it worth someone trying identify and salvage inhibitors that only occurred in my original web scraping?

AnnaDearman commented 4 years ago

Or are they simply too ambiguous and "low quality data" to bother with, do you reckon?

katieskinner98 commented 4 years ago

Ummmm I'm not 100% sure really! I guess the issue is in BindingDB, they tend to use chEMBL IDs as the names of inhibitors, or other names that don't correspond to chEMBL, which is just so annoying! So I think it would take a lot of time for someone to go through and try and compare everything. I mean I am so sure I'm missing a lot of inhibitors because of the inconsistencies between all the sources, so if someone has time they could comb through and do some checking.

I found a website with just under 300 inhibitors on it, and I did some random checking to see if I was missing some inhibitors, and I am, so I'm going to go through and add those tomorrow, but I think that's all I'll have time to do because I need to finalise my part of the documentation and all the other bits!

So strange no one has created a consistent database for kinase inhibitors, it seems like something that is so simple and essential but I really struggled to find something that was easy to scrape 😅

AnnaDearman commented 4 years ago

You've done an amazing job! Did you use the MRC website for the <300 inhibitors? I downloaded their stuff weeks ago but foolishly assumed when I found kidfammap with its 35,000 that I'd be sorted! I should have checked the data more thoroughly. When you've added inhibitors from the other source you found, do you want me to also add any kidfammap ones whose names can't be found in any of your alias columns? It might be a slow script again though...

katieskinner98 commented 4 years ago

Haha thanks! Ummm no I used this website: http://www.icoa.fr/pkidb/, I haven't come across the MRC website actually. Yeah sure if you want to do it and have the time! Having extra kinases definitely isn't a bad thing!

AnnaDearman commented 4 years ago

This is the one I was talking about, not that we're going to bother with it now http://www.kinase-screen.mrc.ac.uk/kinase-inhibitors

AnnaDearman commented 4 years ago

Woo-hoo! Check out our inhibitor searching now! http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com/home

katieskinner98 commented 4 years ago

Omg it looks amazing, well done!!! I just finished adding the extra inhibitors, the new database in on my fork, its the one with _v8 at the end! Also, for the homepage, we have 1,465 inhibitors, and 60,057 inhibitor-kinase relationships 😊

AnnaDearman commented 4 years ago

Well done! Taking a look now.

AnnaDearman commented 4 years ago

I think I will stick my inhibitor data to the end of yours, if you don't mind creating the DB one last time later today! Just running a slow loop now, probably a few more to write and run before it's ready.

katieskinner98 commented 4 years ago

Yeah sure go for it! Let me know when its ready!

Just quickly, the files I used to populate the database are in the populating db folder called final_inhibitors_dataframe.csv and final_inhib_kin_dataframe.csv just in case you weren't using those! These are the ones I created after adding the other inhibitors from that website. I haven't uploaded them to the other folder yet but I will once my scripts are commented and in a state to upload!

AnnaDearman commented 4 years ago

Oh, darn, I wasn't using those! Thanks!

katieskinner98 commented 4 years ago

Haha it's okay, I just remembered that I didn't specify which one was the final file so it's my bad!

AnnaDearman commented 4 years ago

I think I officially give up on this now, the scripts take way too long ☚ī¸

AnnaDearman commented 4 years ago

I think you've got 60,057 inh-kin relationships and 1,465 inhibitors in the final dataframes. You'd said 54,762 and 1,432 before, was that based on the old tables @katieskinner98 ?

katieskinner98 commented 4 years ago

I think I officially give up on this now, the scripts take way too long ☚ī¸

Haha I don't blame you! One of my scripts took 12 hours to run, it was awful!

And yeah I added 33 extra inhibitors this morning, and yep those are the figures that I gave you in my comment from earlier!

Omg it looks amazing, well done!!! I just finished adding the extra inhibitors, the new database in on my fork, its the one with _v8 at the end! Also, for the homepage, we have 1,465 inhibitors, and 60,057 inhibitor-kinase relationships 😊

AnnaDearman commented 4 years ago

Haha, thanks! General comment to everyone: please forgive me when I question you on something you've already told me or considered carefully, I'm old and weary đŸ‘ĩ

katieskinner98 commented 4 years ago

Hahaha no don't worry about it! I haven't slept in like 2 days so I get it 😂

AnnaDearman commented 4 years ago

☚ī¸ Oh no, hope you get some sleep soon!