MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
144 stars 47 forks source link

Timeout submitting large dataset #786

Closed peastman closed 10 months ago

peastman commented 11 months ago

When I tried to submit a large dataset (194k records), it failed with a timeout:

  File "<stdin>", line 1, in <module>
  File "/Users/peastman/miniconda3/envs/qcportal/lib/python3.9/site-packages/qcportal/dataset_models.py", line 230, in submit
    return self._client.make_request(
  File "/Users/peastman/miniconda3/envs/qcportal/lib/python3.9/site-packages/qcportal/client_base.py", line 358, in make_request
    r = self._request(method, endpoint, body=serialized_body, url_params=parsed_url_params)
  File "/Users/peastman/miniconda3/envs/qcportal/lib/python3.9/site-packages/qcportal/client_base.py", line 297, in _request
    r = self._req_session.send(prep_req, verify=self._verify, timeout=self._timeout)
  File "/Users/peastman/miniconda3/envs/qcportal/lib/python3.9/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/peastman/miniconda3/envs/qcportal/lib/python3.9/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='ml.qcarchive.molssi.org', port=443): Read timed out. (read timeout=60)

This is different from the similar problem I had when adding entries. I was able to work around that one by adding entries in blocks. This is happening in the single call to submit().

I managed to work around this one by hacking the code and increasing the timeout from 60 to 600. Even that ended up being close. It took just under ten minutes to complete. Timeouts seem to be a recurring issue. When the server is doing something that takes a long time, could it be made to periodically send back "I'm still working" status messages?

bennybp commented 11 months ago

Is this with v0.50 or v0.51 of QCPortal? Judging by the line numbers, this is v0.50.

v0.51 should do automatic batching so that the timeout is not reached. Eventually I do have an asynchronous/background way of doing this, but that will take some time.

peastman commented 11 months ago

I thought the automatic batching was for add_entries(), not submit()?

bennybp commented 11 months ago

I added it for both since they will both have similar problems :)

https://github.com/MolSSI/QCFractal/blob/e53ffd9c332bfff8ebebebfb0e207e1d1db5465b/qcportal/qcportal/dataset_models.py#L265

bennybp commented 10 months ago

Should have been fixed in v0.51, but reopen if there is still an issue