Closed colin-combe closed 4 months ago
Could we first try increasing allowed execution time to see if it works when given more time?
that would be something to try on your side? (@sureshhewabi , @ypriverol )
Need to implement a caching
I have nothing against caching, but I don't think it will help with this problem because it is with the first load?
From Arthur's original error report:
the initial request (either for sequences or residue pairs) times out after 30 s with HTML code 500 (internal server error). I assume it continues in the background and goes to the cache because, after some time (~5 min was enough for me), the same request completes in 1 or 2 seconds.
As i said, I assume the reason it works second time is caching that's enabled in postgresql by default.
I think the solutions would be: speed up the database queries (might be possible); increase the allowed execution time (not great if it needs about 5 min); or some kind of paging. Maybe all these things would be good.
could we first try increasing allowed execution time to see if it works and how much more time it requires?
also, i'll try speeding up the queries
@sureshhewabi - from https://docs.pylonsproject.org/projects/waitress/en/stable/arguments.html
channel_timeout Maximum seconds to leave an inactive connection open (integer). "Inactive" is defined as "has received no data from a client and has sent no data to a client".
this might be the argument to fiddle with, if we could increase that to 300 maybe things work
i'm not sure what the problem is here now because for me it executes and returns quite quickly. (I tried to fill up my local database to a realistic degree before testing.)
on a copy of pride's current production database, the residue pair query for PXD019437 only takes about second and half for me (first execution so no caching). So i guess its not the db query that's slowing it down.
@sureshhewabi - if you want to test it on the actual database the query is:
SELECT si.id, u.identification_file_name as file, si.pass_threshold as pass,
pe1.dbsequence_ref as prot1, dbs1.accession as prot1_acc, (pe1.pep_start + mp1.link_site1 - 1) as pos1,
pe2.dbsequence_ref as prot2, dbs2.accession as prot2_acc, (pe2.pep_start + mp2.link_site1 - 1) as pos2
FROM spectrumidentification si INNER JOIN
modifiedpeptide mp1 ON si.pep1_id = mp1.id AND si.upload_id = mp1.upload_id INNER JOIN
peptideevidence pe1 ON mp1.id = pe1.peptide_ref AND mp1.upload_id = pe1.upload_id INNER JOIN
dbsequence dbs1 ON pe1.dbsequence_ref = dbs1.id AND pe1.upload_id = dbs1.upload_id INNER JOIN
modifiedpeptide mp2 ON si.pep2_id = mp2.id AND si.upload_id = mp2.upload_id INNER JOIN
peptideevidence pe2 ON mp2.id = pe2.peptide_ref AND mp2.upload_id = pe2.upload_id INNER JOIN
dbsequence dbs2 ON pe2.dbsequence_ref = dbs2.id AND pe2.upload_id = dbs2.upload_id INNER JOIN
upload u on u.id = si.upload_id
where u.id = ANY (array[12, 13, 14, 15]) and mp1.link_site1 > 0 and mp2.link_site1 > 0 AND pe1.is_decoy = false AND pe2.is_decoy = false
AND si.pass_threshold = true;
maybe its actaully working, will test again when api back up
no, it doesn't work, its just like arthur originally said. ~I suspect waitress channel_timeout.~
its a problem with FastAPI https://github.com/PRIDE-Archive/xi-mzidentml-converter/issues/45
its a problem with FastAPI https://github.com/PRIDE-Archive/xi-mzidentml-converter/issues/45
not necessarily, but its also not to do with DB or caching of DB results
@colin-combe can you please confirm that the API is working?
Seems none of the initial endpoints:
https://www.ebi.ac.uk/pride/archive/xiview/ws/projects
https://www.ebi.ac.uk/pride/archive/xiview/ws/pdbdev/projects/{pid}/sequences
https://www.ebi.ac.uk/pride/archive/xiview/ws/pdbdev/projects/{pid}/residue-pairs/based-on-reported-psm-level/passing
etc
are available right now.
I've looked through the source of the website and found:
https://www.ebi.ac.uk/pride/ws/archive/crosslinking/projects
endpoint, but none of the lower-level endpoints (like sequences) are accessible through it.
and
/pride/ws/archive/crosslinking/pdbdev/projects/PXD019437/residue-pairs/based-on-reported-psm-level/all
from #45 aren't available too
@aozalevsky
Please note that all the URLs were changed from /pride/archive/xiview/ws
to /pride/ws/archive/crosslinking
All the URLs can be found (and tested) in swagger page:https://www.ebi.ac.uk/pride/ws/archive/crosslinking/docs
@sureshhewabi @colin-combe thanks for the clarification. I run tests (in that exact order):
It looks like that the effect is still there, but:
1) Even the slowest cold requested was completed in ~15 s which is far less than the timeout. Huge improvement.
2) "Hot" requests from the cache are very consistent
3) I assume residue-pairs
calls internally are almost identical and filtering is applied only at the final stage, thus calling all
first effectively triggering caching for passing
thus there is no hot/cold effect on the latter.
I think that's ok. But we still have to be aware, that if larger entries come up they can potentially time out.
The pdbdev/{pid}/sequences
endpoint doesn't work and always returns empty output.
Example:
https://www.ebi.ac.uk/pride/ws/archive/crosslinking/pdbdev/projects/PXD019437/sequences
{
"data": []
}
I tried crosslinking/projects/{pid}/proteins
instead but it doesn't return sequences. Sequences are crucial for us. Would it be possible to fix it?
I think that's ok. But we still have to be aware, that if larger entries come up they can potentially time out.
Yeah, its a potential problem. Suresh says that pagination of the data returned by REST APIs is highly recommended. I guess that would avoid any potential timeouts.
The pdbdev/{pid}/sequences endpoint doesn't work and always returns empty output.
Thanks for the bug report, yes, it was broken. @sureshhewabi - this PR should fix
https://github.com/PRIDE-Archive/xiview-api/pull/4/files
So retrieving sequences should be fixed once @sureshhewabi has updated things.
@sureshhewabi - i notice theres still string concatenation in the creation of the sql queries for the PDB-Dev endpoint (my bad). I'll fix that.
I will implement a pagination, otherwise we are serving plenty of data
@sureshhewabi i see that https://github.com/PRIDE-Archive/xiview-api/pull/4/files was merged, are there any estimates of when it will be rolled out? Thanks in advance!
@aozalevsky Deployed. Please check now
I added the pagination for the /residue-pairs/based-on-reported-psm-level/passing endpoint as well. (eg: https://www.ebi.ac.uk/pride/ws/archive/crosslinking/pdbdev/projects/PXD019437/residue-pairs/based-on-reported-psm-level/passing)
@sureshhewabi everything seems to be ok, thanks!
@aozalevsky Thank you for your support and comments! Could you please provide more details on this endpoint please:
1) Where are you going to use this endpoint? Will this get called by a webpage or for programatic access for a pipeline
2) Could you reproduce the test that you did previously?
3) What is the tool you were using to do the API performance test? what are the parameters?(For example, if you used Jmeter
- number of threads, Ramp-up period and loops(4 I guess)
BWT, there was a bug in that query and that is why it was serving plenty of data which has been fixed now
@sureshhewabi
- Where are you going to use this endpoint? Will this get called by a webpage or for programatic access for a pipeline
We're going to use it programmatically for the generation of validation reports for structures deposited at https://pdb-dev.wwpdb.org/ . The goal is to establish a correspondence between raw data deposited in mzIdentML files (sequences + residue pairs) and molecules and restraints used for modeling.
- Could you reproduce the test that you did previously?
I repeated the same tests (without pagination)
Multiple entries timed out at the first call (~30s).
worked as expected (already cached after calling all
)
Here everything worked fine but surprisingly consistent. I didn't check the source code, but I assume this data was also triggered/cached during the residue-pairs
call.
- What is the tool you were using to do the API performance test? what are the parameters?(For example, if you used Jmeter - number of threads, Ramp-up period and loops(4 I guess)
oh, it's a very naive test, just a simple python loop:
import requests
import time
import tqdm
responses = []
for r_ in tqdm.tqdm_notebook(response['projects']):
t0 = time.time()
pid = r_['project_id']
url = f"https://www.ebi.ac.uk/pride/ws/archive/crosslinking/pdbdev/projects/{pid}/residue-pairs/based-on-reported-psm-level/all"
r = requests.get(url)
t1 = time.time()
print(pid, t1 - t0)
try:
responses.append(r.json())
except Exception as e:
print(pid, e)
continue
@sureshhewabi - i think i just saw the same thing with https://www.ebi.ac.uk/pride/archive/crosslinking The data didn't load first time and table was left blank, but subsequent attempts did work.
@sureshhewabi looks like crosslinking API is down again. All endpoints return 500
and https://www.ebi.ac.uk/pride/archive/crosslinking shows a blank page.
@aozalevsky Sorry That was my mistake today. I was doing a deployment and eventually it got broken. I am working on this, but looks like it will take more time to fix.
I realised it is because of some instability of the network in openstack. I will contact systems and get back to you
@sureshhewabi thanks! i see that API is back and seems working properly. Except for one entry: PXD036833
It shows both empty sequences and empty residue pairs.
https://www.ebi.ac.uk/pride/ws/archive/crosslinking/pdbdev/projects/PXD036833/sequences
At the same time, https://www.ebi.ac.uk/pride/archive/crosslinking/PXD036833 seems to be ok.
@sureshhewabi @colin-combe can you please check the PXD036833? since it's the only entry that is linked to PDB-Dev structures right now, it's crucial for further development and demos.
I am working on it. I did a fix in API and currently deploying it.(I locally tested it and it was fine) I will delete it and re-parse it in production and let you know.
@aozalevsky Please check now
@sureshhewabi everything works as expected! thank you so much, I'll proceed with integration and testing on our side.
@colin-combe @aozalevsky can we close this?
@sureshhewabi yes
Feedback from PDB-Dev:
Yes, I presume its timing out the first time but succeeding later because the database caches results by default.
Guess there are two possible solutions; speed up the database queries or increase the allowed execution time. Maybe both these things are a good idea.
(Using subqueries before joining tables may speed things up, we're not sure about this.)
Could we first try increasing allowed execution time to see if it works when given more time?