Closed mhkc closed 12 months ago
Dunno if related to this PR but found in this branch so posting here. User menu dropdown needs solid bg:
Uploading samples seems to work very well in terms of speed and concurrent uploads. I used the parallel approach to uploading samples. There were no timeouts and i did not notice any slowdowns when accessing the frontend during the upload.
The dendrogram is not loading for any samples (e.g. http://mtlucmds1.lund.skane.se/staging/bonsai/samples/14MT0500494) while the minhash service chews through the genome index queue.
Edit, more info (another sample):
The queue and polling works as it should. But maybe some timeout and related error message here to let the user know that the tree cannot be retrieved atm?
[2023-11-13 10:36:21,824] INFO in samples: ref: 12MT0500043, body: limit=10 similarity=0.9 cluster=True typing_method=<TypingMethod.MINHASH:
'minhash'> cluster_method=<ClusterMethod.SINGLE: 'single'>, cluster: True
[2023-11-13 10:36:21,825] DEBUG in queue: Pushed job d11e5c75-8a70-43ba-93c6-6e4017e3fb5f into minhash
[2023-11-13 10:36:21,826] DEBUG in minhash: Submitting job, minhash_service.tasks.find_similar_and_cluster to None
INFO: 172.18.0.1:48150 - "POST /samples/12MT0500043/similar HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
INFO: 172.18.0.1:48235 - "GET /job/status/d11e5c75-8a70-43ba-93c6-6e4017e3fb5f HTTP/1.1" 200 OK
.
.
.
Find similar samples does not seem to work. The request is stopped by API and never reaches the minhash service.
All samples are uploaded thru the staging api w/o being assigned to a group, if it matters.
INFO: 172.18.0.1:46100 - "POST /samples/12MT0500043/similar HTTP/1.1" 422 Unprocessable Entity
[2023-11-13 10:28:30 +0000] [8] [DEBUG] POST /staging/bonsai/samples/12MT0500043/similar
[2023-11-13 10:28:30 +0000] [8] [DEBUG] Query API for samples similar to "12MT0500043", similarity: 0.50, limit:
[2023-11-13 10:29:13 +0000] [8] [DEBUG] POST /staging/bonsai/samples/14MT0500581/similar
[2023-11-13 10:29:13 +0000] [8] [DEBUG] Query API for samples similar to "14MT0500581", similarity: 0.50, limit:
[2023-11-13 10:29:24 +0000] [7] [DEBUG] POST /staging/bonsai/samples/14MT0500581/similar
[2023-11-13 10:29:24 +0000] [7] [DEBUG] Query API for samples similar to "14MT0500581", similarity: 0.90, limit:
The new clustering controls in the basket are really slick.
I tried to cluster ~260 samples w/ cgMLST and it worked (although it took a minute or so to load). MLST works fine too. the UI is not blocked while the samples are clustered. the clustering method is displayed in grapetree and the cluster service
it's hard to tell from the logs, but judging by the trees tehmselves the cluster service seems to respect user choice of cluster algorithm.
minhash clustering does not work, but it's probably related to the issues in my comments above.
for fun I sent all 2395 samples into the cgMLST clustering to see what would happen. the cluster service glitched out with the following error:
2023-11-13 10:48:36,965 [DEBUG] rq.queue: Starting BLMOVE operation for rq:queue:allele_cluster with timeout of 405
q2023-11-13 10:49:46,246 [INFO] rq.worker: allele_cluster: allele_cluster_service.tasks.cluster(method='MSTreeV2', profile='\tSACOL0001\tSAC
OL0002\tSACOL0003\tSACOL0004\tSACOL0005\tSACOL0006\tSACOL0...) (8857459a-77dc-419b-b176-9f27243e8740)
2023-11-13 10:49:47,642 [ERROR] rq.worker: [Job 8857459a-77dc-419b-b176-9f27243e8740]: exception raised while executing (allele_cluster_serv
ice.tasks.cluster)
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1428, in perform_job
rv = job.perform()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1278, in perform
self._result = self._execute()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1315, in _execute
result = self.func(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/allele_cluster_service/tasks.py", line 27, in cluster
newick = backend(profile=profile, method=method.value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/allele_cluster_service/ms_trees.py", line 1119, in backend
profiles.append(np.array(part)[allele_cols])
~~~~~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 1859 is out of bounds for axis 0 with size 1859
2023-11-13 10:49:47,653 [DEBUG] rq.queue: Starting BLMOVE operation for rq:queue:allele_cluster with timeout of 405
the samples clustered in grapetree have no metadata associated with them, which was not the case prior to this update.
Thanks for the comments.
@Jakob37 some answers for your questions.
Could JS-modules be used?
Sure why not.
Commenting. Many in-function comments here seems unnecessary. On the other hand - function top-level comments are useful. Have you considered using JSDoc?
I was not aware of JSdoc. Could be useful to transition into, especially with the sphinx integration.
Are you familiar with Web components? I am wondering whether it would make sense to abstract away some things here (such as the spinner), without doing a full transition to a component-based framework such as React.
Yes. I wanted to initially rely mostly on jinja templates and keep the JS DOM manipulation to a minimum. But since the project have grown, perhaps it would be better to transition to more js.
Many functions access DOM elements from within this code. This tend to become a nightmare to maintain if not done with care in just a few locations, and then propagating values by function parameters instead. But maybe this is done in a controlled way here. Just raising the concern.
Thanks for raising this. Yes its a little bit dangerous and was mostly done either as a temporary fix to push a first release or to rely more on jinja templates.
If you like typing similarly to in Python, JSDoc could be used together with the typescript engine, without needing to transition to Typescript. Could this be useful here?
Thanks for the tip. Perhaps we should transition into TS anyway since the number of views have grown and the reliance of JS?
I think it makes sense collecting style-constants (for both CSS and charts) in a single location, to make it easier to overview and maintain.
What do you mena with style constants? Are you thinking of SASS variables and such?
If you like typing similarly to in Python, JSDoc could be used together with the typescript engine, without needing to transition to Typescript. Could this be useful here?
Thanks for the tip. Perhaps we should transition into TS anyway since the number of views have grown and the reliance of JS?
I like working with TS. It might be overkill for just a small amount of JS, but beyond that I think it helps a lot.
I think it makes sense collecting style-constants (for both CSS and charts) in a single location, to make it easier to overview and maintain.
What do you mena with style constants? Are you thinking of SASS variables and such?
Sorry, I was unclear and a bit confused. I am used to working with "JS components" where the style is controlled in a context where you can use regular variables. I haven't worked with SASS variables. No opinion there at the moment.
Workin' thru the checklist from the start!
The minhash-dependent functions like the similarity tree and minhash clustering are still blocked when the minhash service is reprocessing genome signature files (I reuploaded everything via the api) — see my older comments above for more details.
Do we care about that in this update?
I still get an error when searching for similar samples in /groups
, even when the minhash service has finished processing the signature files. Seems to be the api complaining about the frontend request:
INFO: 172.18.0.1:53530 - "POST /samples/14MT0500688/similar HTTP/1.1" 422 Unprocessable Entity
[2023-11-16 07:57:01 +0000] [7] [DEBUG] POST /staging/bonsai/samples/14MT0500688/similar
[2023-11-16 07:57:01 +0000] [7] [DEBUG] Query API for samples similar to "14MT0500688", similarity: 0.95, limit:
The minhash service is idle.
The issue occurs when I forget to set a limit for returned samples in the minhash search options >_>
Clustering w/ minhash works! The minhash service will crash though if any signature file is missing:
Caught an error for this or a future update. The minhash service will crash if a signature file for whatever reason is missing:
2023-11-16 08:06:45,013 [ERROR] rq.worker: [Job 8b722a89-ad3a-4957-9c1c-43e633179fe0]: exception raised while executing (minhash_service.tas
ks.cluster)
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1428, in perform_job
rv = job.perform()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1278, in perform
self._result = self._execute()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1315, in _execute
result = self.func(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/minhash_service/tasks.py", line 104, in cluster
newick: str = cluster_signatures(sample_ids, method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/minhash_service/minhash/cluster.py", line 45, in cluster_signatures
signature = read_signature(sample_id)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/minhash_service/minhash/io.py", line 53, in read_signature
signature_path = get_signature_path(sample_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/worker/app/minhash_service/minhash/io.py", line 45, in get_signature_path
raise FileNotFoundError(f"Signature file not found, {signature_path}")
FileNotFoundError: Signature file not found, /data/signature_db/20MT0500227.sig
Clustering w/ minhash works! The minhash service will crash though if any signature file is missing:
Caught an error for this or a future update. The minhash service will crash if a signature file for whatever reason is missing:
2023-11-16 08:06:45,013 [ERROR] rq.worker: [Job 8b722a89-ad3a-4957-9c1c-43e633179fe0]: exception raised while executing (minhash_service.tas ks.cluster) Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1428, in perform_job rv = job.perform() ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1278, in perform self._result = self._execute() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1315, in _execute result = self.func(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/worker/app/minhash_service/tasks.py", line 104, in cluster newick: str = cluster_signatures(sample_ids, method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/worker/app/minhash_service/minhash/cluster.py", line 45, in cluster_signatures signature = read_signature(sample_id) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/worker/app/minhash_service/minhash/io.py", line 53, in read_signature signature_path = get_signature_path(sample_id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/worker/app/minhash_service/minhash/io.py", line 45, in get_signature_path raise FileNotFoundError(f"Signature file not found, {signature_path}") FileNotFoundError: Signature file not found, /data/signature_db/20MT0500227.sig
I know and I think that is the expected behavior for now as we can then log the stack trace.
The minhash-dependent functions like the similarity tree and minhash clustering are still blocked when the minhash service is reprocessing genome signature files (I reuploaded everything via the api) — see my older comments above for more details.
Do we care about that in this update?
That is the expected outcome with the current setup. We only have one worker and if there are many jobs they will just be added to the queue. This can be solved later on by having multiple workers pulling jobs from the queue but I dont think we are at that point when that is necessary.
I've ran the similar samples test and confirmed that the GUI is not locked when searching and that the correct similarity threshold and limit reaches the minhash service.
I was able to do a search for similar samples, add the matched samples to basket and send off to clustering. That's a smooth workflow right there.
This PR aims to improve the performance of finding similar samples and clustering multiple samples.
Closes #91 #90
How to setup and perform the tests
Documentation
Upload samples
docker-compose
Finding similar samples
Clustering
/groups/<group_name>
Expected outcome
allele_cluster_service
andminhash_service
logs)A short description of the expected outcome of the test above.
Additional context