dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 26 forks source link

upload: large file upload can error out with ('Connection aborted.', OSError("(110, 'ETIMEDOUT')")) #136

Closed yarikoptic closed 3 years ago

yarikoptic commented 4 years ago

From the slack

2020-07-28 14:28:50,777 [    INFO] Found 3 files to consider
PATH                                            SIZE      ERRORS UPLOAD     STATUS                MESSAGE                                                 
dandiset.yaml                                   630 Bytes   0               skipped               exists (same)                                           
sub-001/sub-001_ses-20200727T094620_ecephys.nwb 29.5 GB     0    1%         ERROR                 ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))  
sub-001/sub-001_ses-20200727T094534.nwb         8.4 MB      0    100%       done                  exists (older) - reuploading                            
Summary:                                        29.5 GB          282.3 kB/s 1 skipped             1 exists (same)                                         
                                                                            1 ERROR               1 ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))
                                                                            1 done                1 exists (older) - reuploading   

I have not analyzed yet anyhow: is it something we could mitigate on client side or it is the girder server component... @mgrauer - do we have access to some girder logs which could shine some more light on this particular case? (happened around 2020-07-28 14:28:50,777 timezone yet to clarify)

mhhennig commented 4 years ago

I tried again just now, here the output (time is BST):

2020-07-28 22:29:16,190 [    INFO] Found 3 files to consider
PATH                                            SIZE      ERRORS UPLOAD STATUS                MESSAGE
PATH                                            SIZE      ERRORS UPLOAD STATUS                MESSAGE
dandiset.yaml                                   630 Bytes   0           skipped               exists (same)
sub-001/sub-001_ses-20200727T094620_ecephys.nwb 29.5 GB     0           ERROR                 ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))
sub-001/sub-001_ses-20200727T094534.nwb         8.4 MB      0           skipped               exists (same)
Summary:                                        29.5 GB                 2 skipped             2 exists (same)
                                                                        1 ERROR               1 ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))

It finished at Tue 28 Jul 22:49:12 BST 2020, so ran for about 20 min. dandi-cli version is 0.5.0, installed via pip. System is linux, an internal distribution based on Scientific Linux:

LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Scientific
Description:    Scientific Linux release 7.6 (Nitrogen)
Release:        7.6
Codename:       Nitrogen

Python is 3.7.2, in a miniconda env. Just let me know if there are any further tests I can do. Note that the smaller NWB file transferred without problems.

yarikoptic commented 4 years ago

which dandiset on https://gui.dandiarchive.org/ it was uploaded to? (I thought you did directly in girder web ui) or it was not uploaded yet?

mgrauer commented 4 years ago

It was https://dandiarchive.org/dandiset/000028/draft

nginx logs had a number of errors like this

[28/Jul/2020:22:04:47 +0000] "POST /api/v1/file/chunk?offset=67108864&uploadId=5f20a02ef63d62e1dbd06959 HTTP/1.1" 408 0 "-" "python-requests/2.21.0"

The return code 408 is because the client wasn't sending data before a timeout.

yarikoptic commented 4 years ago

From slack conversation: upload attempts directly from the browser resulted in a similar issue but seems already directly relating to connection with S3. As far as I see it;

As for discussion of a more general "resolution", I have started https://github.com/dandi/dandi-publish/issues/110 .

yarikoptic commented 4 years ago

@mhhennig just checking -- do you have admin access on any of those boxes from which upload fails? may be we could craft a netstat (or alike) monitoring invocation so we would get precise timed protocol of interactions with our server .

mhhennig commented 4 years ago

No, I have no root access unfortunately. There's no good way atm to do monitoring, our building is still closed and here I only have slow broadband and a laptop...the machines I use are in some server room. Could try to ask someone else in a country with fewer infections to do testing though if it would help.

yarikoptic commented 4 years ago

just to make it clear: I didn't need any "physical" access. But since no root -- it answers my question, thanks. I will quickly add some logging at dandi-cli level for as much as it can do by itself ;-) actually since it is all python, I could probably do LOTS! ;) I will buzz you back shortly...

yarikoptic commented 4 years ago

Ok, now there is a https://github.com/dandi/dandi-cli/pull/137 . Please let me know if instructions below are too cumbersome etc, then I will just issue a new dandi-cli release on pypi (I would prefer to avoid it if possible -- needs more testing etc)

  1. establish virtualenv with dandi-cli from that branch, and activate it:
    py=3; d=venvs/dev$py; virtualenv --python=python$py --system-site-packages $d && source $d/bin/activate && pip3 install git+https://github.com/dandi/dandi-cli.git@enh-logging 
  2. Run your upload for a troublesome file, disabling any parallelization and "fancy" output:
    DANDI_LOG_GIRDER=1 DANDI_DEVEL=1 dandi -l DEBUG upload --devel-debug --existing force sub-001/sub-001_ses-20200727T094620_ecephys.nwb 2>&1 | tee /tmp/mydandi.log

    share that /tmp/mydandi.log entirely or just a tail of it around the point of failure.

You should see records like (in below I am uploading to a local instance, so do not copy all the options from it, but from above):

lena:/tmp/000027
$> DANDI_LOG_GIRDER=1 DANDI_DEVEL=1 dandi -l DEBUG upload --devel-debug --existing force --validation ignore -i local-docker sub-759711149 2>&1 | tee /tmp/mydandi.log
...
2020-07-29 11:20:28,824 [   DEBUG] PROGRESS[sub-759711149/sub-759711149.nwb]: 1409286144
2020-07-29 11:20:28,889 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=1409286144&uploadId=5f2193b582d2cb7363bd3131', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7fce0a12d7c0>, 'json': None, 'headers': None, 'jsonResp': True}
2020-07-29 11:20:29,188 [   DEBUG] REST<: {'_id': '5f2193b582d2cb7363bd3131', 'assetstoreId': '5e6ab9e6ed07da5d182247e2', 'created': '2020-07-29T15:20:21.605000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-759711149.nwb', 'parentId': '5f2193b582d2cb7363bd3130', 'parentType': 'item', 'received': 1476395008, 'sha512state': "b'39326e76f590807e3811aa6b9ab45a46d6e03b28586c7a87ddd02403251a83ec57ce4a15176a823fc606f9ea58e1f9df1d964564f292ac3c328aeaf1a33558fb000000c002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000'", 'size': 2251686372, 'tempFile': '/home/assetstore/temp/tmp147o7mil', 'updated': '2020-07-29T15:20:29.186537+00:00', 'userId': '5e6ab9e5ed07da5d182247de'}
2020-07-29 11:20:29,188 [   DEBUG] PROGRESS[sub-759711149/sub-759711149.nwb]: 1476395008
2020-07-29 11:20:29,239 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=1476395008&uploadId=5f2193b582d2cb7363bd3131', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7fce02b20040>, 'json': None, 'headers': None, 'jsonResp': True}
2020-07-29 11:20:29,527 [   DEBUG] REST<: {'_id': '5f2193b582d2cb7363bd3131', 'assetstoreId': '5e6ab9e6ed07da5d182247e2', 'created': '2020-07-29T15:20:21.605000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-759711149.nwb', 'parentId': '5f2193b582d2cb7363bd3130', 'parentType': 'item', 'received': 1543503872, 'sha512state': "b'da48bee581e4ac6706578eac6a5d640cc58dff51f1d86b047bf7f9ac3d6a4e16a45e0dcbcc43bc1d58d07c3310a1c727dc2c5d4281425c54d4f3e15b9f3e436d000000e002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000'", 'size': 2251686372, 'tempFile': '/home/assetstore/temp/tmp147o7mil', 'updated': '2020-07-29T15:20:29.525607+00:00', 'userId': '5e6ab9e5ed07da5d182247de'}
2020-07-29 11:20:29,527 [   DEBUG] PROGRESS[sub-759711149/sub-759711149.nwb]: 1543503872

so I expect us to get a better clue on when we are submitting a request and receiving response from the previous one etc.

mhhennig commented 4 years ago

Just ran this, here's the end of the debug log:

2020-07-29 16:36:57,785 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 201326592, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'keyName': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'partNumber': 3, 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}, 'uploadId': 'LWiP5rLWrTifay_RXeV6rJ19k1r4_gP1ONrIGv4sboDVoxxMlSiPTAI43No7EoglKlbqpVrzsQec8iqnM.pJa_sqjDs2G3480OOephWaXdE3sSkyOzc9uZPAy_me4Hq0nRgB_3EyEVDkkv7Issjagod.itpU.5HuQNs2EIcbTJs-'}, 'size': 29521959808, 'updated': '2020-07-29T15:36:57.711454+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
2020-07-29 16:36:57,785 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: 201326592
2020-07-29 16:36:57,854 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=201326592&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffe08>, 'json': None, 'headers': None, 'jsonResp': True}
^[^]^]{'size': 29521959808}
{'status': 'checking girder'}
{'status': 'validating'}
{'errors': 0}
{'status': 'extracting metadata'}
{'status': 'digesting'}
{'status': 'uploading'}
{'upload': 0.22731845865400346}
{'upload': 0.4546369173080069}
{'upload': 0.6819553759620104}
Traceback (most recent call last):
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 340, in _send_until_done
    return self.connection.send(data)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1757, in send
    self._raise_ssl_error(self._ssl, result)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1663, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (110, 'ETIMEDOUT')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1055, in _send_output
    self.send(chunk)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 977, in send
    self.sock.sendall(data)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 352, in sendall
    data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 346, in _send_until_done
    raise SocketError(str(e))
OSError: (110, 'ETIMEDOUT')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 1055, in _send_output
    self.send(chunk)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/http/client.py", line 977, in send
    self.sock.sendall(data)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 352, in sendall
    data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 346, in _send_until_done
    raise SocketError(str(e))
urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/bin/dandi", line 8, in <module>
    sys.exit(main())
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/cli/command.py", line 118, in wrapper
    return f(*args, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/cli/cmd_upload.py", line 602, in upload
    for v in process_path(path, relpath):
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/cli/cmd_upload.py", line 472, in process_path
    lambda c: client.uploadFileToItem(
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/support/generatorify.py", line 83, in __next__
    return self.send(None)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/support/generatorify.py", line 77, in send
    return self._put("send", value)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/support/generatorify.py", line 69, in _put
    raise val
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/support/generatorify.py", line 53, in thread_func
    ret = func(callback)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/cli/cmd_upload.py", line 473, in <lambda>
    item_rec["_id"], path, progressCallback=c
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/girder_client/__init__.py", line 894, in uploadFileToItem
    return self._uploadContents(obj, f, filesize, progressCallback=progressCallback)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/girder_client/__init__.py", line 1010, in _uploadContents
    data=_ProgressBytesIO(chunk, reporter=reporter))
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/girder_client/__init__.py", line 495, in post
    data=data, json=json, headers=headers, jsonResp=jsonResp)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/dandi/girder.py", line 112, in sendRestRequest
    res = super().sendRestRequest(*args, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/girder_client/__init__.py", line 470, in sendRestRequest
    **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/api.py", line 119, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/disk/scratch/mhennig/miniconda3/envs/spikesorting/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))
yarikoptic commented 4 years ago

Great, please edit and pasted a few more lines on top so we could see when previous REST> was sent (we see only when reply to it was received).

It seems that I might need to flush stdout to get better depiction of order of things (log goes into stderr, and regular lives to stdout) and also report upload ratio in progress log record so we could match... Will do

mhhennig commented 4 years ago

Here's the whole thing: https://gist.github.com/mhhennig/f7ffec71fd6140f111fa2ee0cc59299d

yarikoptic commented 4 years ago

oh, it is blowing up really quickly! Here is those log lines prepended with duration (in ms, to the next line), so we see that it doesn't take abnormally long to submit that next request which never gets a reply

  541 2020-07-29 16:33:05,815 [   DEBUG] REST>: args=('POST', 'file', {'parentType': 'item', 'parentId': '5f219599f63d62e1dbd0695e', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'size': 29521959808, 'mimeType': None}) kwargs={'files': None, 'data': None, 'json': None, 'headers': None, 'jsonResp': True}
   62 2020-07-29 16:33:06,356 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099026+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 0, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}}, 'size': 29521959808, 'updated': '2020-07-29T15:33:06.294875+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
77271 2020-07-29 16:33:06,418 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=0&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffe60>, 'json': None, 'headers': None, 'jsonResp': True}
    0 2020-07-29 16:34:23,689 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 67108864, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'keyName': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'partNumber': 1, 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}, 'uploadId': 'LWiP5rLWrTifay_RXeV6rJ19k1r4_gP1ONrIGv4sboDVoxxMlSiPTAI43No7EoglKlbqpVrzsQec8iqnM.pJa_sqjDs2G3480OOephWaXdE3sSkyOzc9uZPAy_me4Hq0nRgB_3EyEVDkkv7Issjagod.itpU.5HuQNs2EIcbTJs-'}, 'size': 29521959808, 'updated': '2020-07-29T15:34:23.617214+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
   69 2020-07-29 16:34:23,689 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: 67108864
77242 2020-07-29 16:34:23,758 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=67108864&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffe60>, 'json': None, 'headers': None, 'jsonResp': True}
    0 2020-07-29 16:35:41,000 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 134217728, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'keyName': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'partNumber': 2, 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}, 'uploadId': 'LWiP5rLWrTifay_RXeV6rJ19k1r4_gP1ONrIGv4sboDVoxxMlSiPTAI43No7EoglKlbqpVrzsQec8iqnM.pJa_sqjDs2G3480OOephWaXdE3sSkyOzc9uZPAy_me4Hq0nRgB_3EyEVDkkv7Issjagod.itpU.5HuQNs2EIcbTJs-'}, 'size': 29521959808, 'updated': '2020-07-29T15:35:40.929474+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
   70 2020-07-29 16:35:41,000 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: 134217728
76715 2020-07-29 16:35:41,070 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=134217728&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffc50>, 'json': None, 'headers': None, 'jsonResp': True}
    0 2020-07-29 16:36:57,785 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 201326592, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'keyName': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'partNumber': 3, 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}, 'uploadId': 'LWiP5rLWrTifay_RXeV6rJ19k1r4_gP1ONrIGv4sboDVoxxMlSiPTAI43No7EoglKlbqpVrzsQec8iqnM.pJa_sqjDs2G3480OOephWaXdE3sSkyOzc9uZPAy_me4Hq0nRgB_3EyEVDkkv7Issjagod.itpU.5HuQNs2EIcbTJs-'}, 'size': 29521959808, 'updated': '2020-07-29T15:36:57.711454+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
   69 2020-07-29 16:36:57,785 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: 201326592
----- 2020-07-29 16:36:57,854 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=201326592&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffe08>, 'json': None, 'headers': None, 'jsonResp': True}%  

@mgrauer could you please check how it looked on the server at that point in time?

mgrauer commented 4 years ago

Same behavior as before with this particular upload from the Python client, this is an error on the client side (not sending data, or not sending enough data in a short enough time), so nginx times out and responds with a 408. Girder never even sees the last request, only the next-to-last request makes it to Girder (these are nginx logs, the last request here does not have a matching request in Girder logs).

access.log:X.X.X.X - - [29/Jul/2020:15:35:40 +0000] "POST /api/v1/file/chunk?offset=67108864&uploadId=5f2196b2f63d62e1dbd0695f HTTP/1.1" 200 2100 "-" "python-requests/2.23.0" access.log:X.X.X.X - - [29/Jul/2020:15:36:57 +0000] "POST /api/v1/file/chunk?offset=134217728&uploadId=5f2196b2f63d62e1dbd0695f HTTP/1.1" 200 2100 "-" "python-requests/2.23.0" access.log:X.X.X.X - - [29/Jul/2020:15:38:57 +0000] "POST /api/v1/file/chunk?offset=201326592&uploadId=5f2196b2f63d62e1dbd0695f HTTP/1.1" 408 0 "-" "python-requests/2.23.0"

@mhhennig also got an error when using the direct to browser upload, which doesn't proxy through Girder (the Python client proxies chunked uploads through Girder) but rather talks directly to S3. That error was

Browser error is: Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://dandiarchive.s3.amazonaws.com/girder-assetstore/22/6\u20268d402c82d7deadce5184e7155aefce878b54b6ec8a08caf53b0510371462. (Reason: CORS request did not succeed).

@AlmightyYakob says he's seen CORS errors before on request timeouts, because the server never finishes receiving the request, so it cannot set the correct header on the response.

This would be consistent with what we are seeing through the Python upload path, that is a client side timeout (or falling asleep as you put it @yarikoptic đŸ˜´ ) on the request. The browser allows for a retry when an error happens in uploading, though that would be annoying for someone to retry a large number of errors.

@yarikoptic you could put a retry in the Python client, perhaps similarly to the way the Girder CLI does retries by overriding sendRestRequest, maybe retrying 5 times? Or you could even expose this retry count as a command line flag if you wanted users to be able to set it--maybe it would be worth doing a hacky version first before exposing it as a CLI argument.

yarikoptic commented 4 years ago

that is a client side timeout (or falling asleep as you put it @yarikoptic sleeping ) on the request.

I would not yet be 100% certain about that. Here is the joint sequence of events from both logs as I think it has happened (there is no subsec times in web server), with some comments in # lines but I have no clue how/what girder does

76715 2020-07-29 16:35:41,070 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=134217728&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffc50>, 'json': None, 'headers': None, 'jsonResp': True}
# (for ~76 secs) data transmission is happening and then girder digests/forwards data to S3,
# receives some ack/nack and replies/logs in the log upon completion of the operation
access.log: [29/Jul/2020:15:36:57 +0000] "POST /api/v1/file/chunk?offset=134217728&uploadId=5f2196b2f63d62e1dbd0695f HTTP/1.1" 200 2100 "-" "python-requests/2.23.0"
    0 2020-07-29 16:36:57,785 [   DEBUG] REST<: {'_id': '5f2196b2f63d62e1dbd0695f', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-29T15:33:06.099000+00:00', 'mimeType': 'application/octet-stream', 'name': 'sub-001_ses-20200727T094620_ecephys.nwb', 'parentId': '5f219599f63d62e1dbd0695e', 'parentType': 'item', 'received': 201326592, 's3': {'chunkLength': 33554432, 'chunked': True, 'key': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'keyName': 'girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'partNumber': 3, 'relpath': '/dandiarchive/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97', 'request': {'headers': {'Content-Disposition': 'attachment; filename="sub-001_ses-20200727T094620_ecephys.nwb"', 'Content-Type': 'application/octet-stream', 'x-amz-acl': 'private', 'x-amz-meta-uploader-id': '5f1de355f63d62e1dbd06948', 'x-amz-meta-uploader-ip': '129.215.24.71'}, 'method': 'POST', 'url': 'https://dandiarchive.s3.amazonaws.com/girder-assetstore/d8/27/d8278b9868af40419c4133d308eaea97?uploads=&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA3GIMZPVVECMGAIM2%2F20200729%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20200729T153306Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-disposition%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-uploader-id%3Bx-amz-meta-uploader-ip&X-Amz-Security-Token=FwoGZXIvYXdzEGEaDGsgV7CrjJVSz59JKyK%2FAU%2FUNxUv3xA48rHUvzjYSpq5auW3ZbnGlfDyn2A6g5zIWV%2Brok0nX0lSEVsyNZ4TDQsr%2BGswi7vOzxd%2B6pHCZ8CTAjs%2B67oQZHGbMvZcMMD4pmR%2F0w%2FHp4BB4%2BHdqLCHeGdw0GtEWT%2FXytlEhPhZBB4zdCj6iPgLfJ2X6bjl0t8HOMnrIZGGiCHzq2e55za2INruG%2FGigJcG8JRTN58oYDlyn5K67YWPH8WQ99o0Fs%2FJP4MG7b2G0ulUOiG93BITKLKthvkFMi2GGQVGOULBpIE7muawEba02mirETYvdYu21p%2Flz0%2BstFokKUCKRUWh7UfdEh0%3D&X-Amz-Signature=0c2f7c886ad500041f83ca70e12d7d02fac8b415eb5584f6a48a9d41d2e9478a'}, 'uploadId': 'LWiP5rLWrTifay_RXeV6rJ19k1r4_gP1ONrIGv4sboDVoxxMlSiPTAI43No7EoglKlbqpVrzsQec8iqnM.pJa_sqjDs2G3480OOephWaXdE3sSkyOzc9uZPAy_me4Hq0nRgB_3EyEVDkkv7Issjagod.itpU.5HuQNs2EIcbTJs-'}, 'size': 29521959808, 'updated': '2020-07-29T15:36:57.711454+00:00', 'userId': '5f1de355f63d62e1dbd06948'}
   69 2020-07-29 16:36:57,785 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: 201326592
# we are submitting a new request pretty much right away
----- 2020-07-29 16:36:57,854 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=201326592&uploadId=5f2196b2f63d62e1dbd0695f', None) kwargs={'files': None, 'data': <girder_client._ProgressBytesIO object at 0x7f752b8ffe08>, 'json': None, 'headers': None, 'jsonResp': True}%  
# but then at 2 min sharp cut off we are getting 408 response
access.log: [29/Jul/2020:15:38:57 +0000] "POST /api/v1/file/chunk?offset=201326592&uploadId=5f2196b2f63d62e1dbd0695f HTTP/1.1" 408 0 "-" "python-requests/2.23.0"

what I am thinking is:

not sure what is our girder configuration upload_minimum_chunk_size but it seems that the local one does not mind 64k chunk ;) so, @mhhennig would you be so kind to try this "version":

pip3 install git+https://github.com/dandi/dandi-cli.git@temp-small-chunksize

where I made chunk 8 times smaller. If this one works -- that means that we need to somehow mitigate for slow(er) connections. diff from master if interested.

mgrauer commented 4 years ago

Yes, I agree. Sorry, I miswrote:

client side timeout

I meant a client side error (slowness, could be from bandwidth, latency, or other causes such as drowsiness) that results in a server side timeout, but you figured it out correctly regardless of what I said, so you were not sleeping :) . I like your solution of smaller chunk size, because we will have less data to send through in that time, so we are more likely to succeed.

You might also want some retries, but it makes sense to experiment with just the smaller chunk size first. 8 times smaller is a lot, it means there will be 8 times as many API requests which may be inefficient. Overall I like this approach and it is a good start to see if it fixes the problem, then we can optimize chunk sizes/retries/adaptive approaches if needed.

Details on timeouts:

It isn't a 2 minute timeout, it just happens to look like that in this case. The below are nginx logs (above logs are also nginx, not girder) from the same (redacted) IP address, this time only 1:01 until the failure from the previous request.

X.X.X.X - - [28/Jul/2020:12:38:37 +0000] "POST /api/v1/file/chunk?offset=671088640&uploadId=5f201901f63d62e1dbd06953 HTTP/1.1" 200 2085 "-" "python-requests/2.23.0" X.X.X.X - - [28/Jul/2020:12:39:54 +0000] "POST /api/v1/file/chunk?offset=738197504&uploadId=5f201901f63d62e1dbd06953 HTTP/1.1" 200 2085 "-" "python-requests/2.23.0" X.X.X.X - - [28/Jul/2020:12:40:55 +0000] "POST /api/v1/file/chunk?offset=805306368&uploadId=5f201901f63d62e1dbd06953 HTTP/1.1" 408 0 "-" "python-requests/2.23.0"

Note that these are nginx timeouts. Girder never even sees the very last chunk request.

These nginx timeouts are likely culprits:

http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_timeout 60s http://nginx.org/en/docs/http/ngx_http_core_module.html#client_header_timeout 60s http://nginx.org/en/docs/http/ngx_http_core_module.html#lingering_timeout 5s

I am not suggesting that we change these timeouts, I'm just posting details for future reference. Changing these timeouts wouldn't help with the browser path anyway, because we can't control the timeouts on S3. It is a 4XX error, meaning client side, so fixing it on the client is the right approach.

mhhennig commented 4 years ago

not sure what is our girder configuration upload_minimum_chunk_size but it seems that the local one does not mind 64k chunk ;) so, @mhhennig would you be so kind to try this "version":

pip3 install git+https://github.com/dandi/dandi-cli.git@temp-small-chunksize

I tried it, again it did not complete. The log is here: https://gist.github.com/mhhennig/f7ffec71fd6140f111fa2ee0cc59299d

yarikoptic commented 4 years ago

I was about to scream "rate limiting", but then realized that I don't think that you got the @temp-small-chunksize installed... I didn't know that silly pip will not upgrade if already installed this way:

Requirement already satisfied (use --upgrade to upgrade): dandi==0.0.0 from git+https://github.com/dandi/dandi-cli.git@temp-small-chunksize in ./venvs/dev3/lib/python3.8/site-packages

Please do

pip3 install --upgrade git+https://github.com/dandi/dandi-cli.git@temp-small-chunksize

in that "newer" version you should see PROGRESS indicators in a new format (with done= and %done):

2020-07-30 09:28:18,701 [   DEBUG] PROGRESS[sub-759711149/sub-759711149_probe-792602654_ecephys.nwb]: done=109051904 %done=6.54531524362948
yarikoptic commented 4 years ago

thank you @mgrauer !

http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_timeout 60s http://nginx.org/en/docs/http/ngx_http_core_module.html#client_header_timeout 60s http://nginx.org/en/docs/http/ngx_http_core_module.html#lingering_timeout 5s

I am not suggesting that we change these timeouts, I'm just posting details for future reference. Changing these timeouts wouldn't help with the browser path anyway, because we can't control the timeouts on S3. It is a 4XX error, meaning client side, so fixing it on the client is the right approach.

I think the solution might end up with adjustment of both ends. 60s timeout for a body is VERY reasonable for typical deployments (there is no PUT of large files), but as we are file archive, and do expect large (megabytes) body I think it would only be sensible to boost it up. I would say 5 times (unless we see that server at times starts to consume too much RAM, do we have monitoring setup?). It would not be a complete solution BUT it would allow clients for larger chunk sizes transmission (or slower clients to not time out), thus minimizing # of API calls etc. I am not sure if I would approach adjusting girder client upload logic to try dynamic adjustment of the chunk size, but we definitely should keep it in mind while implementing upload through our API. Sure thing changes in nginx nohow would help with direct upload to S3, but let's mitigate one path at a time (unless we then see that we would get similar errors on girder -> s3 path).

It is a 4XX error, meaning client side,

Well - even though they are "Client errors" - mitigation of them is not as straightforward I would say. 403 could be due to incorrect file level permissions on the server side (can be fixed only on the server side). 404 - file was moved on the server, and nothing client can really do (besides being provided a corrected address). 405 (Method Not Allowed) - I think I saw those appearing until I configured web server correctly to fulfill the requests client was supposed to be able to do, and so on.

yarikoptic commented 4 years ago

I am not sure if I would approach adjusting girder client upload logic

having said that I think that was a good idea to retry sendRestRequest regardless, but probably only on a limited set of error codes (e.g. 408 Request Timeout, 409 Conflict), any other comes to mind? I will quickly add retry/logging there to see what we would be getting.

mhhennig commented 4 years ago

Sorry, I actually had wondered if pip did upgrade, but did not check. Now with the "latest" version, still a problem: https://gist.github.com/mhhennig/f7ffec71fd6140f111fa2ee0cc59299d I'm getting the impression the problem may be on my side, could this be the case? What causes these timeouts, it's the client, right? Could this have to do with the network config we have here in Informatics?

yarikoptic commented 4 years ago

ok, I have pushed 29bd59ac66cd58b6e3a4de292ee44316f70ee8e8 into the same branch, so whenever @mhhennig has a chance to try that branch (it will be in combination with smaller chunk size) we will see the effect!

Also, now (as of 1d00ad0ee246cff0bcf83699839a5ad98b091405) we will be back to 64M chunk size by default but you could set desired one with env var DANDI_MAX_CHUNK_SIZE=8388608 (for 8M chunk).

I think those should be good in general to have so issued https://github.com/dandi/dandi-cli/pull/139 from that branch.

Re the most recent attempt by @mhhennig , here is with dtimes:

 10732 2020-07-30 14:43:37,505 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=8388608&uploadId=5f22ce7e385430deb68c9bb3', None) kwargs={'files': None, 'data': <girder_client._Progress
     0 2020-07-30 14:43:48,237 [   DEBUG] REST<: {'_id': '5f22ce7e385430deb68c9bb3', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-30T13:43:26.526000+0
    - {'upload': 0.056829614663500866}
    11 2020-07-30 14:43:48,237 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: done=16777216 %done=0.056829614663500866
 10733 2020-07-30 14:43:48,248 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=16777216&uploadId=5f22ce7e385430deb68c9bb3', None) kwargs={'files': None, 'data': <girder_client._Progres
     1 2020-07-30 14:43:58,981 [   DEBUG] REST<: {'_id': '5f22ce7e385430deb68c9bb3', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-30T13:43:26.526000+0
    - {'upload': 0.0852444219952513}
     9 2020-07-30 14:43:58,982 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: done=25165824 %done=0.0852444219952513
 11443 2020-07-30 14:43:58,991 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=25165824&uploadId=5f22ce7e385430deb68c9bb3', None) kwargs={'files': None, 'data': <girder_client._Progres
     1 2020-07-30 14:44:10,434 [   DEBUG] REST<: {'_id': '5f22ce7e385430deb68c9bb3', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-30T13:43:26.526000+0
    - {'upload': 0.11365922932700173}
     4 2020-07-30 14:44:10,435 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: done=33554432 %done=0.11365922932700173
953781 2020-07-30 14:44:10,439 [   DEBUG] REST>: args=('POST', 'file/chunk?offset=33554432&uploadId=5f22ce7e385430deb68c9bb3', None) kwargs={'files': None, 'data': <girder_client._Progres
----- 2020-07-30 15:00:04,220 [   DEBUG] Caught exception ('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))%  

which

and yes -- most likely it is some rate limiting policy on your network. I would have contacted admins and pointed them to this issue for more details. Meanwhile, if you would be so kind -- could you try to pip install --upgrade again, set also DANDI_MAX_CHUNK_SIZE=8388608 (just in case or would be fun to see also without it) and see if up to 3 retries magically help us to go forward ;-)

mhhennig commented 4 years ago

Ok, here's a run with the latest changes, and the following command: DANDI_LOG_GIRDER=1 DANDI_DEVEL=1 DANDI_MAX_CHUNK_SIZE=8388608 dandi -l DEBUG upload --devel-debug --existing force sub-001/sub-001_ses-20200727T094620_ecephys.nwb 2>&1 | tee /tmp/mydandi.log

Still no success: https://gist.github.com/mhhennig/f7ffec71fd6140f111fa2ee0cc59299d

I'll also check with our comp admin people...

yarikoptic commented 4 years ago

I have pushed 2abe7fac29a2556e226bb1626c99bdd791c6dc47 with which we would retry on more exceptions. Would you be @mhhennig do install/rerun again? I still hope that we would get to a workable solution (while admin people react)

mhhennig commented 4 years ago

Just re-ran it, still breaks, but seems to have gotten a bit further this time (log in same gist).

yarikoptic commented 4 years ago

Ha, this is a "new" one:

 9230 2020-07-30 19:13:48,083 [   DEBUG] REST#1>: args=('POST', 'file/chunk?offset=285212672&uploadId=5f230859385430deb68c9bb8', None) kwargs={'files': None, 'data': <girder_client._Prog
    1 2020-07-30 19:13:57,313 [   DEBUG] REST#1<: {'_id': '5f230859385430deb68c9bb8', 'assetstoreId': '5d9618588a600b0dcb25b4a6', 'behavior': 's3', 'created': '2020-07-30T17:50:17.064000
    - {'upload': 0.9945182566112651}
    4 2020-07-30 19:13:57,314 [   DEBUG] PROGRESS[sub-001/sub-001_ses-20200727T094620_ecephys.nwb]: done=293601280 %done=0.9945182566112651
10087 2020-07-30 19:13:57,318 [   DEBUG] REST#0>: args=('POST', 'file/chunk?offset=293601280&uploadId=5f230859385430deb68c9bb8', None) kwargs={'files': None, 'data': <girder_client._Prog
    - Response text: {"message": "Server has received 292659200 bytes, but client sent offset 293601280.", "type": "rest"}
    0 2020-07-30 19:14:07,405 [   DEBUG] REST#0: failed with HTTP error 400: POST https://girder.dandiarchive.org/api/v1/file/chunk?offset=293601280&uploadId=5f230859385430deb68c9bb8
----- 2020-07-30 19:14:07,405 [   DEBUG] Caught exception HTTP error 400: POST https://girder.dandiarchive.org/api/v1/file/chunk?offset=293601280&uploadId=5f230859385430deb68c9bb8
    - Response text: {"message": "Server has received 292659200 bytes, but client sent offset 293601280.", "type": "rest"}%

which says that server did not receive a "part" (not the same as our chunk size) of the previous (retransmitted?) "body": 293601280 - 292659200 = 942080. I think I have provided a possible "fix" in 63e1350c67b6fb461a49fca3588a4b106616a39b which I just pushed. If that one doesn't work -- something might need to be done on girder side to accommodate retransmissions. @mhhennig - would it be too much to ask to try again? (any word from admins BTW?)

mhhennig commented 4 years ago

I'll re-try as many times as it takes :) Looking promising now, it's at 1.7% and running, I think this far I never got it previously.

mhhennig commented 4 years ago

At 6% now, I'll keep it running overnight...

yarikoptic commented 4 years ago

woooohooo! That is promising!

yarikoptic commented 4 years ago

BTW, if you are keeping that log, please share it afterwards (can just email to debian@onerussian.com) - I would like to see how many retries etc. Also, please do not upload other files right away -- we will need to check if integrity (checksum) is matching (after download). Then let's try one other big one without reducing chunk size and see how that goes... if not too bad -- then we could try without any special settings but I am afraid without nginx tune up for timeouts it would fail since will try to do multiple uploads in parallel

mhhennig commented 4 years ago

So, something changed, it completed although I'm not sure the upload actually worked. Sending the log via email.

PATH                                            SIZE    ERRORS UPLOAD     STATUS    MESSAGE                                    
sub-001/sub-001_ses-20200727T094620_ecephys.nwb                           skipped   Expecting value: line 1 column 1 (char 0)  
Summary:                                        0 Bytes        444.0 kB/s 1 skipped 1 Expecting value: line 1 column 1 (char 0)

Update: data rates from the machine were very low (below a MByte/s) during the transfer, so there may be a problem on our side. Computing support is investigating...

Question from support: Is it an IPv4 or IPv6 connection? If the latter they suggest to test IPv4.

yarikoptic commented 4 years ago

Question from support: Is it an IPv4 or IPv6 connection? If the latter they suggest to test IPv4.

AFAIK just good old v4, I do not even see a record for IPv6 for any of our servers:

$> nslookup girder.dandiarchive.org
Server:     192.168.1.1
Address:    192.168.1.1#53

Non-authoritative answer:
Name:   girder.dandiarchive.org
Address: 3.19.164.171

As for speed -- what speeds do you attain while uploading to any other server? e.g. by running speedtest google provides (just search for speedtest)? What admins said would be an "expected" speed you should see?

yarikoptic commented 4 years ago

re failed run... some bug somewhere. Would you be so kind to rerun while making sure you provide --devel-debug option? actually, ideally, let's run it in debug mode which would give us interactive debugger, so we could just interactively troubleshoot it when/if that issue occurs. Do smth like

DANDI_MAX_CHUNK_SIZE=8388608 DANDI_LOG_GIRDER=1 DANDI_DEVEL=1 dandi -l DEBUG --pdb upload -e force --validation skip --existing force --devel-debug PATHTOFILE

(no saving log into a file) so if exception happens, you would end up in pdb and we could see details (better to work on that in slack more "interactively") but first commands to run would be l to see surrounding code and p to print some variables etc involved.

edit: removed -i local-docker

yarikoptic commented 4 years ago

FTR: I have tried to emulate slow connection locally with

sudo tc qdisc add dev vetha3482b2 root tbf rate 1024kbit latency 50ms burst 1540

for both interfaces related to dandi_client and dandi_girder local development containers... upload speed crawled 0.5-1 Mbps (bits, not bytes! ;)) but did not trigger any retransmission...

an interesting observation in the log you shared is that it is only on the initial REST upload request it seems to fail and then proceeds normally without retransmissions... since backtrace above shows that it is coming from SSL layer, and unlike local using http main uses https, I suspect it might be somehow related to SSL... I will see if I could establish a dedicated "slow down" for connection going from my laptop to the main server so I could possibly trigger original issue myself...

yarikoptic commented 4 years ago

FTR: https://github.com/thombashi/tcconfig seems to be the ultimate tool to assist with "emulation". I did sudo `which tcset` wlp82s0 --delay 100ms --network 3.19.164.171 --rate 8000Kbps (tcset is python based, installed in a virtualenv thus the dance with which)

yarikoptic commented 4 years ago

Unfortunately, even after adding --loss 0.01% I do not see the "desired" behavior of REST request errorring out. Increased to --delay 200ms --network 3.19.164.171 --rate 8000Kbps --loss 1% and the same ...

So I am ready to blame (check) if it might be Python libraries etc. Would you be so kind to provide output of conda list --explicit in your conda environment? may be that brings us "luck" ;)

mhhennig commented 4 years ago

Ok. Some unrelated packages here:

conda ``` # platform: linux-64 @EXPLICIT https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-mkl.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2020.1.1-0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/intel-openmp-2020.1-217.conda https://repo.anaconda.com/pkgs/main/linux-64/libgfortran-ng-7.3.0-hdf63c60_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-9.1.0-hdf63c60_0.conda https://repo.anaconda.com/pkgs/main/linux-64/mpi-1.0-mpich.conda https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-9.1.0-hdf63c60_0.conda https://repo.anaconda.com/pkgs/main/linux-64/mkl-2020.1-217.conda https://repo.anaconda.com/pkgs/main/linux-64/expat-2.2.6-he6710b0_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/gmp-6.1.2-h6c8ec71_1.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/icu-58.2-he6710b0_3.conda https://repo.anaconda.com/pkgs/main/linux-64/jpeg-9b-h024ee3a_2.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_1.conda https://repo.anaconda.com/pkgs/main/linux-64/libsodium-1.0.16-h1bed415_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/libuuid-1.0.3-h1bed415_2.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/libxcb-1.13-h1bed415_1.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/lz4-c-1.8.1.2-h14c3975_0.conda https://conda.anaconda.org/conda-forge/linux-64/mpich-3.3.2-hc856adb_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.conda https://repo.anaconda.com/pkgs/main/linux-64/nodejs-10.13.0-he6710b0_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/openssl-1.1.1g-h7b6447c_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pcre-8.43-he6710b0_0.conda https://repo.anaconda.com/pkgs/main/linux-64/snappy-1.1.7-hbae5bb6_3.conda https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/yaml-0.1.7-had09818_2.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/glib-2.63.1-h3eb4bd4_1.conda https://repo.anaconda.com/pkgs/main/linux-64/hdf5-1.10.4-hb1b8bf9_0.conda https://repo.anaconda.com/pkgs/main/linux-64/libedit-3.1.20181209-hc058e9b_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/libpng-1.6.37-hbc83047_0.conda https://repo.anaconda.com/pkgs/main/linux-64/libxml2-2.9.9-hea5a465_1.conda https://conda.anaconda.org/flatiron/linux-64/mountainlab-0.15.0-0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/pandoc-2.2.3.2-0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/readline-7.0-h7b6447c_5.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.8-hbc83047_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/zeromq-4.3.1-he6710b0_3.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/zstd-1.3.7-h0b5b093_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/blosc-1.16.3-hd408876_0.conda https://repo.anaconda.com/pkgs/main/linux-64/dbus-1.13.14-hb2f20db_0.conda https://repo.anaconda.com/pkgs/main/linux-64/freetype-2.9.1-h8a8886c_1.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/gstreamer-1.14.0-hb31296c_0.conda https://repo.anaconda.com/pkgs/main/linux-64/libtiff-4.1.0-h2733197_0.conda https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.31.1-h62c20be_1.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/fontconfig-2.13.0-h9420a91_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/gst-plugins-base-1.14.0-hbbd80ab_1.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/python-3.7.2-h0371630_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/alabaster-0.7.12-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/atomicwrites-1.4.0-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/attrs-19.3.0-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/backcall-0.1.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/certifi-2020.4.5.1-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/chardet-3.0.4-py37_1003.conda https://repo.anaconda.com/pkgs/main/noarch/colorama-0.4.3-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/decorator-4.4.2-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/defusedxml-0.6.0-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/docutils-0.16-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/entrypoints-0.3-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/future-0.18.2-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/idna-2.9-py_1.conda https://repo.anaconda.com/pkgs/main/noarch/imagesize-1.2.0-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/ipython_genutils-0.2.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/kiwisolver-1.2.0-py37hfd86e86_0.conda https://repo.anaconda.com/pkgs/main/linux-64/markupsafe-1.1.1-py37h7b6447c_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/mistune-0.8.4-py37h7b6447c_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/more-itertools-8.3.0-py_0.conda https://conda.anaconda.org/conda-forge/linux-64/mpi4py-3.0.3-py37hcf07815_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/olefile-0.46-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/pandocfilters-1.4.2-py37_1.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/parso-0.7.0-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pickleshare-0.7.5-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/prometheus_client-0.7.1-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/psutil-5.7.0-py37h7b6447c_0.conda https://repo.anaconda.com/pkgs/main/linux-64/ptyprocess-0.6.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/py-1.8.1-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pymongo-3.9.0-py37he6710b0_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/pyparsing-2.4.7-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/pytz-2020.1-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pyyaml-3.13-py37h14c3975_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/pyzmq-18.1.1-py37he6710b0_0.conda https://repo.anaconda.com/pkgs/main/linux-64/qt-5.9.7-h5867ecd_1.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/qtpy-1.9.0-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/send2trash-1.5.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/sip-4.19.8-py37hf484d3e_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/six-1.14.0-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/snowballstemmer-2.0.0-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-applehelp-1.0.2-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-devhelp-1.0.2-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-htmlhelp-1.0.3-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-jsmath-1.0.1-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-qthelp-1.0.3-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-serializinghtml-1.1.4-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/testpath-0.4.4-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/tornado-6.0.4-py37h7b6447c_1.conda https://repo.anaconda.com/pkgs/main/noarch/tqdm-4.46.0-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/wcwidth-0.1.9-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/webencodings-0.5.1-py37_1.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/zipp-3.1.0-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/babel-2.8.0-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.0-py37he30daa8_1.conda https://repo.anaconda.com/pkgs/main/linux-64/cycler-0.10.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/importlib-metadata-1.6.0-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/jedi-0.17.0-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/mkl-service-2.3.0-py37he904b0f_0.conda https://repo.anaconda.com/pkgs/main/noarch/packaging-20.3-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pexpect-4.8.0-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pillow-7.1.2-py37hb39fc2d_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pyqt-5.9.2-py37h05f1152_2.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/pyrsistent-0.16.0-py37h7b6447c_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/python-dateutil-2.8.1-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/setuptools-46.4.0-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/terminado-0.8.3-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/traitlets-4.3.3-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/bleach-3.1.4-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/cryptography-2.9.2-py37h1ba5d50_0.conda https://repo.anaconda.com/pkgs/main/noarch/importlib_metadata-1.6.0-0.conda https://repo.anaconda.com/pkgs/main/noarch/jinja2-2.11.2-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/joblib-0.14.1-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/jupyter_core-4.6.3-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/networkx-2.4-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/nose-1.3.7-py37_2.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/numpy-base-1.18.1-py37hde5b4d6_1.conda https://repo.anaconda.com/pkgs/main/noarch/pygments-2.6.1-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/wheel-0.34.2-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/jsonschema-3.2.0-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/jupyter_client-6.1.3-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pip-20.0.2-py37_3.conda https://repo.anaconda.com/pkgs/main/linux-64/pluggy-0.13.1-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/noarch/prompt-toolkit-3.0.5-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pyopenssl-19.1.0-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/nbformat-5.0.6-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/prompt_toolkit-3.0.5-0.conda https://repo.anaconda.com/pkgs/main/linux-64/pytest-5.0.1-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/urllib3-1.25.8-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/ipython-7.13.0-py37h5ca1d4c_0.conda https://repo.anaconda.com/pkgs/main/linux-64/nbconvert-5.6.1-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/requests-2.23.0-py37_0.conda https://repo.anaconda.com/pkgs/main/linux-64/ipykernel-5.1.4-py37h39e3cac_0.conda https://repo.anaconda.com/pkgs/main/noarch/sphinx-3.0.3-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/jupyter_console-6.1.0-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/notebook-6.0.3-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/numpydoc-0.9.2-py_0.conda https://repo.anaconda.com/pkgs/main/noarch/qtconsole-4.7.4-py_0.conda https://repo.anaconda.com/pkgs/main/linux-64/widgetsnbextension-3.5.1-py37_0.conda https://repo.anaconda.com/pkgs/main/noarch/ipywidgets-7.5.1-py_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/jupyter-1.0.0-py37_7.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/h5py-2.10.0-py37h7918eee_0.conda https://repo.anaconda.com/pkgs/main/linux-64/imageio-2.6.0-py37_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/matplotlib-3.0.2-py37h5429711_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/mkl_fft-1.0.15-py37ha843d7b_0.conda https://repo.anaconda.com/pkgs/main/linux-64/mkl_random-1.1.1-py37h0573a6f_0.conda https://repo.anaconda.com/pkgs/main/linux-64/numpy-1.18.1-py37h4f9e942_0.conda https://repo.anaconda.com/pkgs/main/linux-64/pandas-0.24.1-py37he6710b0_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/python-blosc-1.7.0-py37h7b6447c_0.conda https://repo.anaconda.com/pkgs/main/linux-64/scipy-1.4.1-py37h0b6359f_0.conda https://conda.anaconda.org/jaikumarm/linux-64/hyperopt-0.1.2-py37h39e3cac_0.tar.bz2 https://repo.anaconda.com/pkgs/main/linux-64/scikit-learn-0.22.1-py37hd81dba3_0.conda https://conda.anaconda.org/conda-forge/linux-64/hdbscan-0.8.24-py37hc1659b7_0.tar.bz2 https://conda.anaconda.org/conda-forge/noarch/scikit-optimize-0.5.2-py_0.tar.bz2 ```
And pip: ``` alabaster 0.7.12 appdirs 1.4.4 atomicwrites 1.4.0 attrs 19.3.0 Babel 2.8.0 backcall 0.1.0 bleach 3.1.4 blessings 1.7 blinker 1.4 blosc 1.8.1 cached-property 1.5.1 certifi 2020.4.5.1 cffi 1.14.0 chardet 3.0.4 ci-info 0.2.0 Click 7.0 click-didyoumean 0.0.3 cloudpickle 1.3.0 colorama 0.4.3 cryptography 2.9.2 cycler 0.10.0 Cython 0.29.6 dandi 0.0.0 decorator 4.4.2 defusedxml 0.6.0 Deprecated 1.2.7 diskcache 4.1.0 docutils 0.16 doit 0.32.0 elephant 0.6.1 entrypoints 0.3 et-xmlfile 1.0.1 etelemetry 0.2.1 fasteners 0.14.1 future 0.18.2 ghp-import2 1.0.1 girder-client 3.1.0 h5py 2.10.0 hdbscan 0.8.24 hdmf 1.6.4 herdingspikes 0.3.7 humanize 2.5.0 hyperopt 0.1.2 idna 2.9 imageio 2.6.0 imagesize 1.2.0 importlib-metadata 1.6.0 ipykernel 5.1.4 ipython 7.13.0 ipython-genutils 0.2.0 ipywidgets 7.5.1 isosplit5 0.1.3 jdcal 1.4 jedi 0.17.0 jeepney 0.4.3 Jinja2 2.11.2 joblib 0.14.1 jp-proxy-widget 1.0.0 jsonschema 3.2.0 jupyter 1.0.0 jupyter-client 6.1.3 jupyter-console 6.1.0 jupyter-core 4.6.3 kachery 0.4.7 kbucket 0.13.0 keyring 21.2.1 keyrings.alt 3.4.0 kiwisolver 1.2.0 klusta 3.0.16 klustakwik2 0.2.7 llvmlite 0.31.0 lxml 4.5.0 Mako 1.1.2 Markdown 3.2.1 MarkupSafe 1.1.1 matplotlib 3.0.2 matplotlib-venn 0.11.5 MEArec 1.4.1 MEAutility 1.4.6 mistune 0.8.4 mkl-fft 1.0.15 mkl-random 1.1.1 mkl-service 2.3.0 ml-ms4alg 0.3.2 monotonic 1.5 more-itertools 8.3.0 mountainlab-pytools 0.7.3 mountaintools 0.7.1 mpi4py 3.0.3 natsort 7.0.1 nbconvert 5.6.1 nbformat 5.0.6 neo 0.8.0 networkx 2.4 Nikola 8.0.4 nose 1.3.7 notebook 6.0.3 numba 0.47.0 numpy 1.18.1 numpydoc 0.9.2 olefile 0.46 openpyxl 2.6.1 packaging 20.3 pairio 0.4.0 pandas 0.24.1 pandocfilters 1.4.2 parso 0.7.0 patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5 piexif 1.1.3 Pillow 7.1.2 pip 20.0.2 pluggy 0.13.1 prometheus-client 0.7.1 prompt-toolkit 3.0.5 psutil 5.7.0 ptyprocess 0.6.0 py 1.8.1 pybind11 2.2.4 pycparser 2.20 pycrypto 2.6.1 pygit2 1.2.1 PyGithub 1.46 pygithub3 0.5.1 Pygments 2.6.1 pyinotify 0.9.6 PyJWT 1.7.1 pymongo 3.10.1 pynwb 1.3.3 pyOpenSSL 19.1.0 pyout 0.5.1 pyparsing 2.4.7 PyQt5 5.14.1 PyQt5-sip 12.7.1 pyqtgraph 0.10.0 pyrsistent 0.16.0 PyRSS2Gen 1.1 PySocks 1.7.1 pytest 5.0.1 python-dateutil 2.8.1 python-dotenv 0.10.3 python-frontmatter 0.5.0 pytools 2020.1 pytz 2020.1 PyYAML 3.13 pyzmq 18.1.1 qtconsole 4.7.4 QtPy 1.9.0 quantities 0.12.3 requests 2.23.0 requests-toolbelt 0.9.1 ruamel.yaml 0.16.10 ruamel.yaml.clib 0.2.0 scikit-learn 0.23.0 scikit-optimize 0.5.2 scipy 1.4.1 seaborn 0.9.0 SecretStorage 3.1.2 semantic-version 2.8.5 Send2Trash 1.5.0 setuptools 46.4.0.post20200518 setuptools-git 1.2 simple-crypt 4.1.7 simplejson 3.17.0 six 1.14.0 sklearn 0.0 snowballstemmer 2.0.0 Sphinx 3.0.3 sphinxcontrib-applehelp 1.0.2 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 1.0.3 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.4 sphinxcontrib-websupport 1.1.0 spikecomparison 0.2.6 spikeextractors 0.8.4 spikefeatures 0.1.0 spikeforest2 0.1.0 /disk/scratch/mhennig/spikeinterface/spikeforest2 spikeinterface 0.9.1 spikemetrics 0.2.0 spikesorters 0.3.2 spiketoolkit 0.6.3 spikewidgets 0.4.3 spyking-circus 0.9.7 statsmodels 0.11.1 terminado 0.8.3 testpath 0.4.4 threadpoolctl 2.0.0 tornado 6.0.4 tqdm 4.46.0 traitlets 4.3.3 tridesclous 1.6.0 Unidecode 1.1.1 urllib3 1.25.8 vdom 0.6 venn 0.1.3 wcwidth 0.1.9 webencodings 0.5.1 wheel 0.34.2 widgetsnbextension 3.5.1 wrapt 1.12.0 Yapsy 1.12.2 zipp 3.1.0 ```
yarikoptic commented 4 years ago

thanks! edited your comment to make those collapsable. Hint for future - use html:

<details>
<summary></summary> 

```shell
HERE GOES LOTS OF TEXT
```
</details>
yarikoptic commented 4 years ago

FWIW, I am sorry to report @mhhennig that you are too special! ;) With network throttling, in a similar conda env (+ plain pip install of dandi from git, the same girder-client version verified) -- I see no initial retransmission. Having said that, I think that the last error we have observed (Expecting value: line 1 column 1 (char 0)) is most likely our client code or pyout issue. So if you run with --devel-debug and that --pdb -- if you crash similarly it means that our code , if not - that could have been pyout ;) Are you running that reupload ATM? how is it going?

yarikoptic commented 3 years ago

we are no longer using girder backend, and will stay hopeful that such an error no longer happens! ;)