celiaccb / Software-Development-Group-Project-2020

JACKY, a friendly tool for human kinase, phosphosites and kinase inhibitors information
http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com
0 stars 0 forks source link

Tests to perform #59

Closed AnnaDearman closed 4 years ago

AnnaDearman commented 4 years ago

Hi all,

I thought we should agree on a systematic way to test JACKY, and perhaps assign testing duties to specific people. Am happy to take this on. Please comment here if you can think of additional tests, and if you want to conduct these tests. Please bear in mind we might need to repeat tests as new versions are deployed.

Where possible, we should refer to online resources, or raw files downloaded from them (warning: ignore non-human data!), to verify the information in JACKY, rather than the csvs we've produced ourselves.

Test 1: Disease-based: (I have performed this test a few times)

A minority of phosphosites will be from phospho.ELM and won't have the same amount of information as the rest. I'll have a think about checking for those...

East-YutangChen commented 4 years ago

Where is the latest URL for JACKY?

AnnaDearman commented 4 years ago

http://jacky-env-02.ehym3crjpy.eu-west-2.elasticbeanstalk.com/

AnnaDearman commented 4 years ago

I will think of other tests to perform, and if anyone else wants to write some here, feel free.

AnnaDearman commented 4 years ago

Test 2: Statistical analysis:

AnnaDearman commented 4 years ago

Findings.docx Hi all,

Here are my findings from testing the statistical analysis using a sample dataset available at https://casecpb.shinyapps.io/ksea/, who also have a KSEA app. I took out the rows that contained multiple substrates, as our app can't handle those, and I made a version with rearranged columns so that our app can process it. I ran the data on their app and our app, and wrote up my findings. Let me know your thoughts. I've also included my Excel sheet, which is a bit chaotic, if you want all the numbers.

Anna Comparison of Jacky and KSEA app with their data.xlsx

JuanLM1978 commented 4 years ago

Good work, i will check your data when i have time. if positive results this could be used to validate our analysis in our website

AnnaDearman commented 4 years ago

Hi all,

During my testing, I've found a problem with my kinase translation for the phospho.elm data. Gene MAPKAPK3 is called "MAPK3_HUMAN" (Q16644), but gene MAPK3 is called "MK03_HUMAN" (P27361). I had both listed as gene: MAPK3. 😢

I will fix it...

Anna

AnnaDearman commented 4 years ago

@celiaccb Remind me to give you the latest phosphosite etc counts for the homepage after I've re-run my scripts!

AnnaDearman commented 4 years ago

@celiaccb Please could you update the count of kinase-phosphosite relationships to 11,456?

AnnaDearman commented 4 years ago

@katieskinner98 I've made a pull request with new phosphosite files in my "Generating...files" folder. I didn't overwrite the ones in your "populating_db" folder as it felt like treading on your toes! 😄 Would you mind doing so, and re-running the database script, please? We had two different UniProt IDs for MAPK3 depending on whether the data was from phosphosite.org or phospho.elm so I changed my script a bit to fix it (and possibly other similar examples)!

katieskinner98 commented 4 years ago

@katieskinner98 I've made a pull request with new phosphosite files in my "Generating...files" folder. I didn't overwrite the ones in your "populating_db" folder as it felt like treading on your toes! 😄 Would you mind doing so, and re-running the database script, please? We had two different UniProt IDs for MAPK3 depending on whether the data was from phosphosite.org or phospho.elm so I changed my script a bit to fix it (and possibly other similar examples)!

Sorry for the delay with this! For some reason when I tried to merge my new scripts and db into the master it said it couldn't automatically merge, so in the process of fixing this! New db will be up very soon and hopefully I used the right .csv's to populate it, I have so many files I'm getting confused 🤣

AnnaDearman commented 4 years ago

@katieskinner98 Hi Katie, please could you create another database using the latest phosphosite files on my fork (again, in my Generating... folder; I haven't over-written the files in your populating... folder). I realised I was pulling the wrong gene ID from UniProt and ending up with phosphosite IDs like "YTDC2_HUMAN_HUMAN(S510)", plus I now have the full phospho.elm URL

katieskinner98 commented 4 years ago

@AnnaDearman yeah of course! doing it now will be about 10 minutes!

AnnaDearman commented 4 years ago

@celiaccb Hi Celia,

In application.py, we have the following line:

searchphosdis = session.query(PhosphositesDiseases).join(Phosphosites).filter(Phosphosites.PHOS_ID==phosphosite_name).all()

I'm confused because the PHOS_ID column in each table uses a different format. Shouldn't it be PHOS_ID5 in the phosphosites table? And if so, is there a way to point to a different column for each table?

I'm going to test out changing the two preceding lines (searchphos and searchphoskin) to use PHOS_ID5 because I was getting some strange phosphosite lists in some kinage pages (not enough results). UPDATE: This didn't work 😁

Anna

celiaccb commented 4 years ago

Hey! Can you be more specific about which strange phosphosite lists did you get?

celiaccb commented 4 years ago

Also, the queries you are changing are for the phosphosite page, so they should not change anything on the kinase page, is the problem then on the kinase page or the phosphosite page?

AnnaDearman commented 4 years ago

I know that there were problems listing the kinases on the phosphosites page. I've got myself very confused today ☹️

AnnaDearman commented 4 years ago

Hey! Can you be more specific about which strange phosphosite lists did you get?

I'll try to replicate the problem and then let you know

celiaccb commented 4 years ago

Ok thanks

celiaccb commented 4 years ago

If you change the query to PHOS.ID5, you would need to go to phoshits.html and change

Screen Shot 2020-02-08 at 15 38 44

(without the spaces)

Did you do that? That might be why it did not work

AnnaDearman commented 4 years ago

So when you search on http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com/ for kinase "RPS6KA3" you get 52 phosphosites but when I check the kinases_phosphosites table in SQLite Studio and search for RPS6KA3 there are only 49.

It returns these extras:

PH0141680 | p27Kip1 | CDKN1B | T198 PH0073874 | Fos | FOS | S362 PH0239529 | MEF2C | MEF2C_HUMAN | S192

AnnaDearman commented 4 years ago

CDKN1B(T198) should only be phosphorylated by PRKAA1, CAMK1, SGK1, PIM1, AKT1

celiaccb commented 4 years ago

So the problem is on the kinase page then?

AnnaDearman commented 4 years ago

FOS(S362) should only be phosphorlyated by RPS6KA1 and RSK-2

celiaccb commented 4 years ago

Oh it's when you search a phosphosite by 'Phosphorylated by' ?

celiaccb commented 4 years ago

Where does it return 52 phosphosites?

AnnaDearman commented 4 years ago

No, sorry. One problem is, when you search for a kinase and look at the table of targets, the list it returns can be wrong (in the above example, three extra phosphosites). Another problem is, for some phosphosites (like the one in the other Issue thread) on the phosphosite page, it doesn't list the kinases even though they have kinase information.

AnnaDearman commented 4 years ago

Where does it return 52 phosphosites?

In the kinase page under "Targets".

celiaccb commented 4 years ago

Is the phosphosite page problem for phosphosites with no protein, kinase information etc. only or for other phosphosites too?

AnnaDearman commented 4 years ago

Is the phosphosite page problem for phosphosites with no protein, kinase information etc. only or for other phosphosites too?

It happened for a phospho.ELM phosphosite with minimal information. I haven't checked how often it happens yet. I realised I was making stupid phosphosite IDs with "HUMAN_HUMAN" in them at that point and went away and fixed that! However, the phosphosite that had the problem did have kinase information, despite not having much other information! So it should have been displaying kinases in the table under the "phosphorylation" header

AnnaDearman commented 4 years ago

To replicate it, search by kinase for "MK01_HUMAN", then, in the "Targets" table, search for "PH0242070" and click on it. The "Phosphorylation" table is empty, even though we know it's phosphorylated by "MK01_HUMAN" (and I checked and it's also phosphorylated by MK03_HUMAN)

celiaccb commented 4 years ago

Ok I'm confused, in the database on the kinases_phosphosites table, the KIN_ACC_ID_2 is supposed to be the Kinase UniProt_ID, the UniProt ID of RPS6KA3 is P51812, which I searched on the kinases_phosphosites table, and did not find matches on that column?

AnnaDearman commented 4 years ago

image

When you say you searched, are you referring to a manual search in SQLite Studio, or the script in application.py?

celiaccb commented 4 years ago

I meant on SQLite Studio, I am using the database that is on the populating_db folder on the master, is that not the correct one?

AnnaDearman commented 4 years ago

Is that the very recent one from Katie today? I'm using the online Jacky with yesterday's database

celiaccb commented 4 years ago

So where do I find the database you're using? The one I am using was uploaded 19 hours ago

AnnaDearman commented 4 years ago

I don't understand why they're not the same

celiaccb commented 4 years ago

Should we not be using the latest database though?

AnnaDearman commented 4 years ago

I thought I was?

AnnaDearman commented 4 years ago

image Just re-downloaded it from GitHub, don't understand why your searches aren't working

celiaccb commented 4 years ago
Screen Shot 2020-02-08 at 16 15 40

This is what I get, did you download it from the master or from Katies fork? Im going to download it again, see if smt changes

celiaccb commented 4 years ago

Ok, they do show but lower on the search (I don't know why) Also, the problem for the phosphosite is that it's PHOS_ID NAN_HUMAN(S96) is not found on the KinasesPhosphosites table

AnnaDearman commented 4 years ago

I think that must be why I wanted to use the PHOS_ID5 because that's based on UniProt ID

celiaccb commented 4 years ago

Ok, I will change to PHOS_ID5 where needed, and then you can deploy ok?

AnnaDearman commented 4 years ago

OK, thanks! I'll deploy when I've made new csv files and Katie's made a database from them (I did run my scripts earlier but I'd made an error that caused the columns to come out in a weird order!).

celiaccb commented 4 years ago

Ok, just changed it (application.py, phoshits.html, kinasepage.html, zphoshitsgen.html, zphoshitsgenalt.html)