Closed AnnaDearman closed 4 years ago
Where is the latest URL for JACKY?
I will think of other tests to perform, and if anyone else wants to write some here, feel free.
Test 2: Statistical analysis:
Findings.docx Hi all,
Here are my findings from testing the statistical analysis using a sample dataset available at https://casecpb.shinyapps.io/ksea/, who also have a KSEA app. I took out the rows that contained multiple substrates, as our app can't handle those, and I made a version with rearranged columns so that our app can process it. I ran the data on their app and our app, and wrote up my findings. Let me know your thoughts. I've also included my Excel sheet, which is a bit chaotic, if you want all the numbers.
Good work, i will check your data when i have time. if positive results this could be used to validate our analysis in our website
Hi all,
During my testing, I've found a problem with my kinase translation for the phospho.elm data. Gene MAPKAPK3 is called "MAPK3_HUMAN" (Q16644), but gene MAPK3 is called "MK03_HUMAN" (P27361). I had both listed as gene: MAPK3. 😢
I will fix it...
Anna
@celiaccb Remind me to give you the latest phosphosite etc counts for the homepage after I've re-run my scripts!
@celiaccb Please could you update the count of kinase-phosphosite relationships to 11,456?
@katieskinner98 I've made a pull request with new phosphosite files in my "Generating...files" folder. I didn't overwrite the ones in your "populating_db" folder as it felt like treading on your toes! 😄 Would you mind doing so, and re-running the database script, please? We had two different UniProt IDs for MAPK3 depending on whether the data was from phosphosite.org or phospho.elm so I changed my script a bit to fix it (and possibly other similar examples)!
@katieskinner98 I've made a pull request with new phosphosite files in my "Generating...files" folder. I didn't overwrite the ones in your "populating_db" folder as it felt like treading on your toes! 😄 Would you mind doing so, and re-running the database script, please? We had two different UniProt IDs for MAPK3 depending on whether the data was from phosphosite.org or phospho.elm so I changed my script a bit to fix it (and possibly other similar examples)!
Sorry for the delay with this! For some reason when I tried to merge my new scripts and db into the master it said it couldn't automatically merge, so in the process of fixing this! New db will be up very soon and hopefully I used the right .csv's to populate it, I have so many files I'm getting confused 🤣
@katieskinner98 Hi Katie, please could you create another database using the latest phosphosite files on my fork (again, in my Generating... folder; I haven't over-written the files in your populating... folder). I realised I was pulling the wrong gene ID from UniProt and ending up with phosphosite IDs like "YTDC2_HUMAN_HUMAN(S510)", plus I now have the full phospho.elm URL
@AnnaDearman yeah of course! doing it now will be about 10 minutes!
@celiaccb Hi Celia,
In application.py, we have the following line:
searchphosdis = session.query(PhosphositesDiseases).join(Phosphosites).filter(Phosphosites.PHOS_ID==phosphosite_name).all()
I'm confused because the PHOS_ID column in each table uses a different format. Shouldn't it be PHOS_ID5 in the phosphosites table? And if so, is there a way to point to a different column for each table?
I'm going to test out changing the two preceding lines (searchphos and searchphoskin) to use PHOS_ID5 because I was getting some strange phosphosite lists in some kinage pages (not enough results). UPDATE: This didn't work 😁
Anna
Hey! Can you be more specific about which strange phosphosite lists did you get?
Also, the queries you are changing are for the phosphosite page, so they should not change anything on the kinase page, is the problem then on the kinase page or the phosphosite page?
I know that there were problems listing the kinases on the phosphosites page. I've got myself very confused today ☹️
Hey! Can you be more specific about which strange phosphosite lists did you get?
I'll try to replicate the problem and then let you know
Ok thanks
If you change the query to PHOS.ID5, you would need to go to phoshits.html and change
(without the spaces)
Did you do that? That might be why it did not work
So when you search on http://jacky-03.ehym3crjpy.eu-west-2.elasticbeanstalk.com/ for kinase "RPS6KA3" you get 52 phosphosites but when I check the kinases_phosphosites table in SQLite Studio and search for RPS6KA3 there are only 49.
It returns these extras:
PH0141680 | p27Kip1 | CDKN1B | T198 PH0073874 | Fos | FOS | S362 PH0239529 | MEF2C | MEF2C_HUMAN | S192
CDKN1B(T198) should only be phosphorylated by PRKAA1, CAMK1, SGK1, PIM1, AKT1
So the problem is on the kinase page then?
FOS(S362) should only be phosphorlyated by RPS6KA1 and RSK-2
Oh it's when you search a phosphosite by 'Phosphorylated by' ?
Where does it return 52 phosphosites?
No, sorry. One problem is, when you search for a kinase and look at the table of targets, the list it returns can be wrong (in the above example, three extra phosphosites). Another problem is, for some phosphosites (like the one in the other Issue thread) on the phosphosite page, it doesn't list the kinases even though they have kinase information.
Where does it return 52 phosphosites?
In the kinase page under "Targets".
Is the phosphosite page problem for phosphosites with no protein, kinase information etc. only or for other phosphosites too?
Is the phosphosite page problem for phosphosites with no protein, kinase information etc. only or for other phosphosites too?
It happened for a phospho.ELM phosphosite with minimal information. I haven't checked how often it happens yet. I realised I was making stupid phosphosite IDs with "HUMAN_HUMAN" in them at that point and went away and fixed that! However, the phosphosite that had the problem did have kinase information, despite not having much other information! So it should have been displaying kinases in the table under the "phosphorylation" header
To replicate it, search by kinase for "MK01_HUMAN", then, in the "Targets" table, search for "PH0242070" and click on it. The "Phosphorylation" table is empty, even though we know it's phosphorylated by "MK01_HUMAN" (and I checked and it's also phosphorylated by MK03_HUMAN)
Ok I'm confused, in the database on the kinases_phosphosites table, the KIN_ACC_ID_2 is supposed to be the Kinase UniProt_ID, the UniProt ID of RPS6KA3 is P51812, which I searched on the kinases_phosphosites table, and did not find matches on that column?
When you say you searched, are you referring to a manual search in SQLite Studio, or the script in application.py?
I meant on SQLite Studio, I am using the database that is on the populating_db folder on the master, is that not the correct one?
Is that the very recent one from Katie today? I'm using the online Jacky with yesterday's database
So where do I find the database you're using? The one I am using was uploaded 19 hours ago
I don't understand why they're not the same
Should we not be using the latest database though?
I thought I was?
Just re-downloaded it from GitHub, don't understand why your searches aren't working
This is what I get, did you download it from the master or from Katies fork? Im going to download it again, see if smt changes
Ok, they do show but lower on the search (I don't know why) Also, the problem for the phosphosite is that it's PHOS_ID NAN_HUMAN(S96) is not found on the KinasesPhosphosites table
I think that must be why I wanted to use the PHOS_ID5 because that's based on UniProt ID
Ok, I will change to PHOS_ID5 where needed, and then you can deploy ok?
OK, thanks! I'll deploy when I've made new csv files and Katie's made a database from them (I did run my scripts earlier but I'd made an error that caused the columns to come out in a weird order!).
Ok, just changed it (application.py, phoshits.html, kinasepage.html, zphoshitsgen.html, zphoshitsgenalt.html)
Hi all,
I thought we should agree on a systematic way to test JACKY, and perhaps assign testing duties to specific people. Am happy to take this on. Please comment here if you can think of additional tests, and if you want to conduct these tests. Please bear in mind we might need to repeat tests as new versions are deployed.
Where possible, we should refer to online resources, or raw files downloaded from them (warning: ignore non-human data!), to verify the information in JACKY, rather than the csvs we've produced ourselves.
Test 1: Disease-based: (I have performed this test a few times)
A minority of phosphosites will be from phospho.ELM and won't have the same amount of information as the rest. I'll have a think about checking for those...