RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

ARAX heavy load #1416

Closed edeutsch closed 2 years ago

edeutsch commented 3 years ago

ARAX is currently under pretty heavy load evaluation at least three simultaneous long running queries:

top - 04:16:50 up 250 days,  6:45,  0 users,  load average: 3.82, 3.91, 3.82
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
%Cpu(s): 23.0 us,  2.4 sy,  0.0 ni, 74.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 64706056 total,  5718452 free, 18163236 used, 40824368 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 45572300 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                      
 2801 rt        20   0 5609136 3.987g   7084 S 100.7  6.5  47:17.16 python3                                                                                                      
 2929 rt        20   0 5777160 4.145g   7248 S 100.7  6.7  56:00.84 python3                                                                                                      
 2728 rt        20   0 4095040 2.601g   7712 S 100.3  4.2  40:47.82 python3                        

The logs show computing tons and tons of NGDs.

Is anyone still eagerly awaiting the results of these 45 min+ queries or shall I just flush them?

edeutsch commented 3 years ago

So over at /test, the logs are crawling along with:

2021-04-27T04:28:29.975087 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C0741453 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:38.270358 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C0857678 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:41.050202 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C0860165 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:41.964337 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C0860691 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:44.743879 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C1167662 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:49.371643 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C1275499 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:51.211773 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C1290925 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:54.863445 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C1318607 in FET probably because expander ignores node type. For more details, please see issue897.
2021-04-27T04:28:58.538883 WARNING: [] Fail to query adjacent nodes from ARAX/KG2 for UMLS:C1334965 in FET probably because expander ignores node type. For more details, please see issue897.
edeutsch commented 3 years ago

/beta and production are just working on NGD stuff for over an hour.

2021-04-27T04:31:39.197674 DEBUG: [] 172 publications found for edge (MONDO:0004992)-[]-(MONDO:0004976) limiting to 30...
2021-04-27T04:31:41.718849 DEBUG: [] 165 publications found for edge (MESH:D017209)-[]-(MONDO:0006047) limiting to 30...
2021-04-27T04:31:42.304866 DEBUG: [] 1094 publications found for edge (MESH:D009154)-[]-(MONDO:0005300) limiting to 30...
2021-04-27T04:31:42.723927 DEBUG: [] 14870 publications found for edge (MESH:D010361)-[]-(MONDO:0005083) limiting to 30...
2021-04-27T04:31:43.305979 DEBUG: [] 147 publications found for edge (MESH:D013812)-[]-(MONDO:0004910) limiting to 30...
2021-04-27T04:31:44.514798 DEBUG: [] 79 publications found for edge (CHEMBL.TARGET:CHEMBL372)-[]-(MONDO:0018903) limiting to 30...
2021-04-27T04:31:46.178530 DEBUG: [] 113 publications found for edge (MESH:D010361)-[]-(MONDO:0006258) limiting to 30...
2021-04-27T04:31:46.675599 DEBUG: [] 87 publications found for edge (UMLS_STY:T098)-[]-(MONDO:0002806) limiting to 30...
2021-04-27T04:31:47.747975 DEBUG: [] 299 publications found for edge (MESH:D013812)-[]-(UMLS:C0522569) limiting to 30...
2021-04-27T04:31:50.383019 DEBUG: [] 128 publications found for edge (MESH:D002477)-[]-(MONDO:0045042) limiting to 30...
saramsey commented 3 years ago

Thanks for cleaning house in advance of the stand-up meeting!


Stephen Ramsey, PhD Associate Professor, Oregon State University

@.**@.> https://lab.saramsey.org

On Apr 26, 2021, at 9:18 PM, Eric Deutsch @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

ARAX is currently under pretty heavy load evaluation at least simultaneous long running queries:

top - 04:16:50 up 250 days, 6:45, 0 users, load average: 3.82, 3.91, 3.82 Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie %Cpu(s): 23.0 us, 2.4 sy, 0.0 ni, 74.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 64706056 total, 5718452 free, 18163236 used, 40824368 buff/cache KiB Swap: 0 total, 0 free, 0 used. 45572300 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2801 rt 20 0 5609136 3.987g 7084 S 100.7 6.5 47:17.16 python3 2929 rt 20 0 5777160 4.145g 7248 S 100.7 6.7 56:00.84 python3 2728 rt 20 0 4095040 2.601g 7712 S 100.3 4.2 40:47.82 python3

The logs show computing tons and tons of NGDs.

Is anyone still eagerly awaiting the results of these 45 min+ queries or shall I just flush them?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRTXteam%2FRTX%2Fissues%2F1416&data=04%7C01%7C%7C1ac53fd7ead343cf629f08d90933860f%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C637550939172992806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VBcB59ScjQNf7aWRV%2B2%2F%2BstMwgaXsH0hB4Ue3es1fLk%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABKN5637V34J6GYXATV5S2TTKY3JVANCNFSM43UFWQWA&data=04%7C01%7C%7C1ac53fd7ead343cf629f08d90933860f%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C637550939173002764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vcRv07VhmXZtZZvQkR5lysObU9Qnyq0WXCcLngtvauA%3D&reserved=0.

dkoslicki commented 3 years ago

Sorry about that: I was trying a variation of the standup query and it exploded on me. Hence why I switched to test and beta (should have done that in the first place) and was trying to quash the combinatorial explosion with FET. Hopefully after #1402 is addressed, those kinds of queries won’t take so long.

Anyone know if it’s technically feasible to have a cancel button on the UI? I modified a query just slightly and saw that the second hop returned 26K nodes. I definitely would have terminated it before it went to all the NGD overlays if I had the ability.

edeutsch commented 3 years ago

It is probably technically feasible, but will probably be tricky. We are holding the connection open as we stream progress. It would be interesting if we can send some sort of command on the open stream that the process on the other end can hear? Or alternatively maybe actively killing the open connection could be a signal that the server could detect somehow and stop things. Or failing all that, the stream could provide a process handle xxxx of some sort and there could be a button that opens a second connection effectively with a command to terminate process handle xxxx and that could trigger some internal code to cause those processes to stop working. We don't want to kill the whole flask process, but just end one of the threads. All that to say, probably something could be done, but my understanding of the internals of HTTP and proxies and Flask and all the moving parts is severely limited and I don't know the solution right now. It would require fiddling and testing and judicious copy-paste from Stackoverflow probably, too.

finnagin commented 2 years ago

@edeutsch are we okay to close this one out?

edeutsch commented 2 years ago

yes, this is ancient history, closing.