chStaiger / iBridges-Gui

A graphical user interface to iRODS employing the iBridges python package.
https://chstaiger.github.io/iBridges-Gui/
GNU General Public License v3.0
15 stars 8 forks source link

Can't upload files above a certain size #261

Open jnimoth opened 1 week ago

jnimoth commented 1 week ago

Describe the bug Upload stops working for me for files above a certain size (32 MB?).

I saw today, first on Windows using ibridges-gui 1.1.0, that do not work when I select files with a size above a certain threshold. I did not test the exact limit, but it seems to be somewhere around 30-40 MB.

Here the content of the ibridges-gui log which shows some uploads that worked and some that did not

__________________________________________________
__________________________________________________
     Starting iBridges-GUI 
    2024-09-05T12:47:28.206671
__________________________________________________
__________________________________________________
2024-09-05 12:48:01,323 - ibridges-gui - DEBUG - Login with C:\Users\myuser\.irods\irods_environment.json and password from prompt.
2024-09-05 12:48:01,325 - ibridges-gui - INFO - Logged in as rdms-testers@rug.nl to store-test.data.rug.nl; working coll /testrug/home/rdms-testers@rug.nl
2024-09-05 12:49:09,413 - ibridges-gui - ERROR - No session created. {}
NoneType: None
2024-09-05 12:49:34,102 - ibridges-gui - DEBUG - Login with C:\Users\myuser\.irods\irods_environment.json and password from prompt.
2024-09-05 12:49:34,105 - ibridges-gui - INFO - Logged in as rdms-testers@rug.nl to store-test.data.rug.nl; working coll /testrug/home/rdms-testers@rug.nl
2024-09-05 12:50:00,237 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 12:50:00,806 - ibridges-gui - INFO - Transfer data thread: Transfer \\cl3-usr3\usr3\myuser\My Documents\UFC_EWP_Networksharing.docx -->  /testrug/home/rdms-testers@rug.nl/UFC_EWP_Networksharing.docx, overwrite False
2024-09-05 12:50:00,810 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 12:51:36,231 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 12:51:36,911 - ibridges-gui - ERROR - Transfer data thread: Could not transfer  Y:\staff\LowCost\fse\Jelte\9GB_Testfile --> /testrug/home/rdms-testers@rug.nl/9GB_Testfile; KeyError(-405000)
Traceback (most recent call last):
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridgesgui\threads.py", line 103, in run
    _obj_put(
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridges\executor.py", line 394, in _obj_put
    session.irods_session.data_objects.put(local_path, str(irods_path), **options)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 367, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\exception.py", line 186, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000
2024-09-05 12:51:36,934 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 12:52:08,463 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 12:52:09,048 - ibridges-gui - ERROR - Transfer data thread: Could not transfer  Y:\staff\LowCost\fse\Jelte\chemdraw22.2.0.exe --> /testrug/home/rdms-testers@rug.nl/chemdraw22.2.0.exe; KeyError(-405000)
Traceback (most recent call last):
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridgesgui\threads.py", line 103, in run
    _obj_put(
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridges\executor.py", line 394, in _obj_put
    session.irods_session.data_objects.put(local_path, str(irods_path), **options)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 367, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\exception.py", line 186, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000
2024-09-05 12:52:09,066 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 12:52:33,521 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 12:52:34,309 - ibridges-gui - INFO - Transfer data thread: Transfer X:\My Downloads\Another_Mnova.mnova -->  /testrug/home/rdms-testers@rug.nl/Another_Mnova.mnova, overwrite False
2024-09-05 12:52:34,312 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 12:53:46,735 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 12:53:47,366 - ibridges-gui - ERROR - Transfer data thread: Could not transfer  Y:\staff\LowCost\fse\Jelte\MX-21.3_x64.iso --> /testrug/home/rdms-testers@rug.nl/MX-21.3_x64.iso; KeyError(-405000)
Traceback (most recent call last):
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridgesgui\threads.py", line 103, in run
    _obj_put(
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\ibridges\executor.py", line 394, in _obj_put
    session.irods_session.data_objects.put(local_path, str(irods_path), **options)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\parallel.py", line 367, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\manager\data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "C:\Users\myuser\AppData\Roaming\Python\Python310\site-packages\irods\exception.py", line 186, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000
2024-09-05 12:53:47,379 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 12:53:59,159 - ibridges-gui - INFO - Disconnecting rdms-testers@rug.nl from store-test.data.rug.nl

I then also tried the same under Linux with the same version of ibridges-gui and saw the same (file that failed was ~34 MB):

__________________________________________________
__________________________________________________
     Starting iBridges-GUI 
    2024-09-05T13:13:27.734199
__________________________________________________
__________________________________________________
2024-09-05 13:15:27,674 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 13:15:28,319 - ibridges-gui - INFO - Transfer data thread: Transfer /home/jeltenimoth/Downloads/RDM_Basics_Workshop_2023.pptx -->  /testrug/home/rdms-testers@rug.nl/RDM_Basics_Workshop_2023.pptx, overwrite False
2024-09-05 13:15:28,322 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.
2024-09-05 13:16:12,923 - ibridges-gui - DEBUG - Transfer data thread: Created new session.
2024-09-05 13:16:13,584 - ibridges-gui - ERROR - Transfer data thread: Could not transfer  /home/jeltenimoth/Downloads/datapackage_JPN_PhD_Project1_archive1718282550.dpg.tar --> /testrug/home/rdms-testers@rug.nl/datapackage_JPN_PhD_Project1_archive1718282550.dpg.tar; KeyError(-405000)
Traceback (most recent call last):
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/ibridgesgui/threads.py", line 103, in run
    _obj_put(
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/ibridges/executor.py", line 394, in _obj_put
    session.irods_session.data_objects.put(local_path, str(irods_path), **options)
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 194, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 289, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/parallel.py", line 469, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/parallel.py", line 360, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 442, in open
    desc = conn.recv().int_info
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "/home/jeltenimoth/python_venvs/ibridges_new/lib/python3.10/site-packages/irods/exception.py", line 171, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000
2024-09-05 13:16:13,597 - ibridges-gui - DEBUG - Transfer data thread: Thread session successfully deleted.

Expected behavior Upload works works.

Provide some information:

chStaiger commented 1 week ago

Many thanks for reporting this.

I could reproduce the error like this:

The closing of the upload window, should not stop the thread, at least it did not do that in mac and linux. But indeed the thread and the session there are deleted and hence the data ends up in INTERMEDIATE state:

Screenshot 2024-09-05 at 19 46 15

However, when I do not close the Upload window, the data does get transferred correctly and is checksummed.

Screenshot 2024-09-05 at 19 53 54

I see the same behaviour, creating intermediate state data for the sync tab. So I assume you found a very nice and rare bug for us, which has to do with how to handle multi-threaded applications.

A workaround is to use the upload button, and not to close the window, which is quite inconvenient because then you cannot use the rest of the application.

chStaiger commented 1 week ago

Note to us, I could reproduce the same behaviour on Mac.

trel commented 1 week ago

data ends up in INTERMEDIATE state

I hope that once the server times out waiting for the client to come back... the state of the replica should change to 'STALE'.

chStaiger commented 1 week ago

I did some more testing:

I uploaded a 1GB file and indeed it first shows as INTERMEDIATE, but the thread does not stop. Once the transfer is finished, the replica is in the state GOOD. If the client is closed while the transfer is still in progress, the state of the data goes to STALE.

So far I could not reproduce that the iBridges GUI crashes on the keyerror ...

chStaiger commented 1 week ago

@jnimoth it is also funny that the logs show the error nearly directly after starting the transfer. There is only half a second between creating the new session to transfer the data and the error with the code -405000. Can you please check whether you receive the same behaviour for the CLI?

chStaiger commented 1 week ago

data ends up in INTERMEDIATE state

I hope that once the server times out waiting for the client to come back... the state of the replica should change to 'STALE'.

Indeed that is the case at least on my testing iRODS instance and our Yoda instances @jnimoth can you check in which state the data is after the unsuccessful transfer? Do you see the empty file with the corresponding state in the GUI Browser?

chStaiger commented 1 week ago

@trel Do you have an idea what could cause an

ierror -405000
irods error: -405000 INTERMEDIATE_REPLICA_ACCESS

How can I trigger this error through a client?

chStaiger commented 1 week ago

@jnimoth I could finally trigger the error that you see. I uploaded a 1GB file and while the upload is still ongoing I tried to access the file on iRODS through a download. So one thing that could happen at your side is, that e.g. a replication of the uploaded file kicks in before the file is written in its entirety to iRODS, or another user/process tries to get the data while you are uploading.

Could that be the case?

jnimoth commented 1 week ago

So for me it crashes directly after starting the upload yes. In the interface it then looks like that:

ibridges_upload_error_1

And when I close the window, I see the file as stale:

ibridges_upload_error_2

ibridges_upload_error_3

It also crashes for me when tried via CLI:

$ ibridges upload ~/SURFdrive/Unishare/Calculations.zip irods:/rug/home/rdms-testers@rug.nl/ibridges_tests/upload_again/ 
 21%|███████████████████████████▊                                                                                                       | 64.0M/301M [00:01<00:03, 66.5MB/s]ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
Traceback (most recent call last):
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/__main__.py", line 498, in ibridges_upload
    ops = upload(
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/data_operations.py", line 124, in upload
    ops.execute(session, ignore_err=ignore_err)
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/executor.py", line 177, in execute
    self.execute_upload(session, pbar, ignore_err=ignore_err)
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/executor.py", line 226, in execute_upload
    _obj_put(
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/executor.py", line 394, in _obj_put
    session.irods_session.data_objects.put(local_path, str(irods_path), **options)
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/parallel.py", line 367, in _io_multipart_threaded
CRITICAL:irods.connection:Read 0 bytes from socket instead of expected 4 bytes
ERROR:irods.connection:Could not receive server response
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/exception.py", line 186, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jelte/python_venvs/ibridges_venv_2/bin/ibridges", line 8, in <module>
    sys.exit(main())
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/__main__.py", line 104, in main
    ibridges_upload()
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/__main__.py", line 494, in ibridges_upload
    with interactive_auth(irods_env_path=_get_ienv_path()) as session:
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/session.py", line 110, in __exit__
    self.close()
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/ibridges/session.py", line 224, in close
    self.irods_session.cleanup()
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/session.py", line 201, in cleanup
    conn.disconnect()
  File "/home/jelte/python_venvs/ibridges_venv_2/lib/python3.10/site-packages/irods/connection.py", line 325, in disconnect
    self.socket = self.socket.unwrap()
  File "/usr/lib/python3.10/ssl.py", line 1347, in unwrap
    s = self._sslobj.shutdown()
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2702)
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
ERROR:irods.connection:Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
 23%|█████████████████████████████▌                                                                                                     | 68.0M/301M [00:01<00:05, 42.2MB/s]

Trying the same via icommands on the other hand works perfectly fine:

$ iput -P ~/SURFdrive/Unishare/Calculations.zip /rug/home/rdms-testers@rug.nl/icommands_tests
0/1 -  0.00% of files done   0.000/300.835 MB -  0.00% of file sizes done
Processing Calculations.zip - 300.835 MB   2024-09-06.10:58:59
Calculations.zip - 300.835/300.835 MB - 100.00% done   2024-09-06.10:59:03
$ ils -L /rug/home/rdms-testers@rug.nl/icommands_tests
/rug/home/rdms-testers@rug.nl/icommands_tests:
  rdms-testers      0 rootResc;randy;pt1;mnt_nfsirods1    315448162 2024-09-06.10:59 & Calculations.zip
    sha2:6VOyd9XVMiDs1rYe1zPc9OucFb8G2qMIgUGeTH1DyG8=    generic    /mnt/nfsirods1/home/rdms-testers@rug.nl/icommands_tests/Calculations.zip
trel commented 1 week ago

@trel Do you have an idea what could cause an

ierror -405000
irods error: -405000 INTERMEDIATE_REPLICA_ACCESS

How can I trigger this error through a client?

Anytime you attempt to access / get / read a replica that is already in the INTERMEDIATE state, you should get this error returned to the client.

chStaiger commented 1 week ago

@jnimoth Thanks for also trying the icommands. There is one difference between iBridges and the icommands: we also calculate and register checksums. Could you try again with iput -KP ~/SURFdrive/Unishare/Calculations.zip /rug/home/rdms-testers@rug.nl/icommands_tests

chStaiger commented 1 week ago

And another observation. You used the icommands without enforcing to calculate, verify and registering the checksum. Yet after the upload there is a sha2 checksum.

Can it be that there is a process on your iRODS server that always calculates checksums for all new data that comes in and which might interfere with PRC option to verify and register the checksum?

I see in your gitrepo that there is a checksum calculation after the upload in two places https://github.com/rijksuniversiteit-groningen/rdms/blob/5d0c3d58df187530fd18e800ea2ec6b63b573e9a/src/python/rules/rugrdms.re#L30C1-L39C2

https://github.com/rijksuniversiteit-groningen/rdms/blob/5d0c3d58df187530fd18e800ea2ec6b63b573e9a/src/python/rules/rugrdms.re#L65

@trel can it be that the PRC flag to do a checksum check and this action stand each other in the way or that the delayed rule to calculate a checksum starts too early?

trel commented 1 week ago

@trel can it be that the PRC flag to do a checksum check and this action stand each other in the way or that the delayed rule to calculate a checksum starts too early?

Hmm, I hope/think not. Any jobs picking up work to do should only start AFTER the incoming file is at rest (including any checksum calculations). I cannot see those two links you posted into RUG's repository.

I see the line in the logs above that seems like there is something out of order...

ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2702)

chStaiger commented 1 week ago

Is it something that we can fix on the client side? I am sorry for bothering you, but this goes a bit beyond my iRODS server knowledge and all the possible configurations of clients.

jnimoth commented 1 week ago

@chStaiger Using iput -PK shows the same behavior (edit: meaning that it works as without -K). For your other question: Yes in our system the checksums are always calculated as far as I know. For more details how that is in our system and if it might be related, I also involve @jpmcfarland .

trel commented 1 week ago

Is it something that we can fix on the client side?

Yes, as I understand that message... the application (client) is sending data after the SSL handshake is considered complete. Which of course, the server did not expect.

So, first check would be to see if there is an unexpected early session close() or unexpected later data sent.

chStaiger commented 1 week ago

@jnimoth We did some more investigtion but we cannot see in our code where we would accidentally close or open a new session or add more parameters to an existing session ...

Can you do one last experiment for us and try out to upload the large file through the python-irodsclient?

from irods.session import iRODSSession
session = iRODSSession(irods_env_file="/<YOUR HOME>/.irods/irods_environment.json")
session.server_version
import irods.keywords as kw
options = {kw.NUM_THREADS_KW: 4, kw.REG_CHKSUM_KW: "", kw.VERIFY_CHKSUM_KW: ""}
options[kw.FORCE_FLAG_KW] = ""
session.data_objects.put(local_path, str(irods_path), **options)
jnimoth commented 1 week ago

@chStaiger Just tried with PRC 2.1.0 directly for a 'big file' (~42 MB). Here what I saw.

First once more via ibridges-gui: Jelte_ibridges_error_1

Jelte_ibridges_error_2

And now the same trial with PRC:

Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from irods.session import iRODSSession
>>> session = iRODSSession(irods_env_file="/home/username/.irods/irods_environment_ibridges.json")
>>> session.server_version
(4, 2, 11)
>>> import irods.keywords as kw
>>> options = {kw.NUM_THREADS_KW: 4, kw.REG_CHKSUM_KW: "", kw.VERIFY_CHKSUM_KW: ""}
>>> options[kw.FORCE_FLAG_KW] = ""
>>> session.data_objects.put("/home/username/NMR.tar", "/rug/home/rdms-testers@rug.nl/ibridges_tests/upload_again/", **options)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/parallel.py", line 367, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "/home/username/.venv/ibridges_110/lib/python3.10/site-packages/irods/exception.py", line 186, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -405000

Jelte_ibridges_error_3

I also then tried PRC with a much smaller file (~1.7 MB). This worked without issues.

I also tried once more for comparison the file that did not work via icommands directly from the same device using iput -PK. This also worked without problems (not shown).

trel commented 1 week ago

Not convinced these are directly related, but they were fixed in 4.2.12...

simstoica commented 1 week ago

To me it seems something similar to this issues (not sure if it is related, but the 32 MB size and the parallel transfer errors seem familiar). I know it is not the rue engine, but still a mix of python and native clients

https://github.com/irods/irods_rule_engine_plugin_python/issues/54

trel commented 1 week ago

Also... @alanking, any chance we're hitting https://github.com/irods/irods/issues/5742?

Separately... the KeyError itself is probably due to 405000 not being in this file:

But that doesn't feel like the whole answer.

Please try editing your exception.py and seeing how the same transfer behaves again.

+ class INTERMEDIATE_REPLICA_ACCESS(iRODSException):
+    code = -405000
trel commented 1 week ago

To me it seems something similar to this issues (not sure if it is related, but the 32 MB size and the parallel transfer errors seem familiar). I know it is not the rue engine, but still a mix of python and native clients

irods/irods_rule_engine_plugin_python#54

Oh, interesting. Is the PREP involved in these uploads @jnimoth?

A vote against this theory is that iCommands are behaving as far as we can tell (the PREP rules would fire, regardless of the client making the calls)...

chStaiger commented 1 week ago

Just some summary from my side. I tested large data transfers of 1GB to 10GB with the PRC 2.0.1 to an iRODS 4.12 and 4.3.1 server, both using SSL encryption and the transfers worked without any problems.

trel commented 1 week ago

PRC 2.0.1

Just confirming, 2.0.1, rather than 2.1.0. Good if true.

chStaiger commented 1 week ago

We also noticed that the PRC requires different settings in the irods_environment.json to function properly with an iRODS server which uses SSL encryption. Here are the SSL parameters from my irods_environment.json:

"irods_encryption_algorithm": "AES-256-CBC",
"irods_encryption_key_size": 32,
"irods_encryption_num_hash_rounds": 16,
"irods_encryption_salt_size": 8,
"irods_client_server_policy": "CS_NEG_REQUIRE",
"irods_client_server_negotiation": "request_server_negotiation"

The icommands will also work without the "irods_client_server_policy". Are there other parameters which we need to set?

chStaiger commented 1 week ago

PRC 2.0.1

Just confirming, 2.0.1, rather than 2.1.0. Good if true.

In iBridges we ask for python-irodsclient>=2.0.0 and I tested indeed with 2.0.1

chStaiger commented 1 week ago

PRC 2.0.1

Just confirming, 2.0.1, rather than 2.1.0. Good if true.

In iBridges we ask for python-irodsclient>=2.0.0 and I tested indeed with 2.0.1

Just confirming, uploading a 1GB file with PRC 2.1.0 through iBridges works fine too.

alanking commented 1 week ago

Also... @alanking, any chance we're hitting irods/irods#5742?

That issue involves repeated, simultaneous iput-ing and irm-ing (read: no -f means it is a rename... https://github.com/irods/irods/issues/7935) the same file to the same logical path which does not seem related to this issue.

trel commented 1 week ago

The icommands will also work without the "irods_client_server_policy".

is that true?

Any discrepancy between iCommands and PRC is probably an oversight - they're using the same protocol to speak with the server.

chStaiger commented 1 week ago

The icommands will also work without the "irods_client_server_policy".

is that true?

Any discrepancy between iCommands and PRC is probably an oversight - they're using the same protocol to speak with the server.

I deleted "irods_client_server_policy": "CS_NEG_REQUIRE" from my irods_environment.json and logged in with the icommands and with the PRC through iBridges:

cstaiger@integration:~$ iinit
Enter your current PAM password:
cstaiger@integration:~$ rm ~/.irods/.irodsA
cstaiger@integration:~$ ibridges init
None None
Your iRODS password:
Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 319, in server_version
    return tuple(ast.literal_eval(reported_vsn))
  File "/usr/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 0

    ^
SyntaxError: unexpected EOF while parsing

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 325, in __server_version
    conn = next(iter(self.pool.active))
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 70, in get_connection
    conn = self.idle.pop()
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/session.py", line 186, in authenticate_using_password
    _ = irods_session.server_version
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 321, in server_version
    return self.__server_version()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 328, in __server_version
    conn = self.pool.get_connection()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 17, in method_
    ret = method(self,*s,**kw)
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 87, in get_connection
    conn = Connection(self, self.account)
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/connection.py", line 62, in __init__
    self._server_version = self._connect()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/connection.py", line 302, in _connect
    raise NetworkException("Client-Server negotiation failure: {},{}".format(client_policy, server_policy))
irods.exception.NetworkException: Client-Server negotiation failure: CS_NEG_REFUSE,CS_NEG_REQUIRE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cstaiger/.local/bin/ibridges", line 8, in <module>
    sys.exit(main())
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/__main__.py", line 78, in main
    ibridges_init()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/__main__.py", line 132, in ibridges_init
    with interactive_auth(irods_env_path=ienv_path) as session:
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/interactive.py", line 60, in interactive_auth
    session = Session(irods_env=irods_env_path, password=password)
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/session.py", line 69, in __init__
    self.irods_session = self.connect()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/session.py", line 167, in connect
    return self.authenticate_using_password()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/ibridges/session.py", line 188, in authenticate_using_password
    raise _translate_irods_error(e) from e
ibridges.session.LoginError: Host, port, irods_client_server_policy or irods_client_server_negotiation not set correctly in irods_environment.json
jnimoth commented 1 week ago

Also... @alanking, any chance we're hitting irods/irods#5742?

Separately... the KeyError itself is probably due to 405000 not being in this file:

But that doesn't feel like the whole answer.

Please try editing your exception.py and seeing how the same transfer behaves again.

+ class INTERMEDIATE_REPLICA_ACCESS(iRODSException):
+    code = -405000

@trel I changed the exception.py and then repeated all again, here is what I get now:

Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from irods.session import iRODSSession
>>> session = iRODSSession(irods_env_file="/home/jelte/.irods/irods_environment.json")
>>> session.server_version
(4, 2, 11)
>>> import irods.keywords as kw
>>> options = {kw.NUM_THREADS_KW: 4, kw.REG_CHKSUM_KW: "", kw.VERIFY_CHKSUM_KW: ""}
>>> options[kw.FORCE_FLAG_KW] = ""
>>> session.data_objects.put("/home/jelte/Downloads/NMR.tar", "/rug/home/rdms-testers@rug.nl/ibridges_tests/", **options)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 264, in put
    if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 363, in parallel_put
    return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/parallel.py", line 479, in io_main
    retval = _io_multipart_threaded (Operation, (Data, Io), replica_token, resc_hier, session, fname, total_bytes,
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/parallel.py", line 367, in _io_multipart_threaded
    Io = session.data_objects.open( Data_object.path, Operation.data_object_mode(initial_open = False),
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 518, in open
    desc = conn.recv().int_info
  File "/home/jelte/python_venvs/prc_venv_211/lib/python3.10/site-packages/irods/connection.py", line 152, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
irods.exception.INTERMEDIATE_REPLICA_ACCESS: None
jnimoth commented 1 week ago

We also tried a little bit more today and @simstoica got PRC and iBridges running also for bigger files on our system after modifying them so that she enforced single threaded transfers.

Furthermore, I also tried to set irods_maximum_size_for_single_buffer_in_megabytes in the irods_environment json to a very high number, then try again with the PRC. Here I still see the same issue as reported above.

trel commented 1 week ago

So it sounds like...

trel commented 1 week ago

irods.exception.INTERMEDIATE_REPLICA_ACCESS: None

Yes, so, that's definitely what's happening - we'll get that added to the exceptions.py file...

Still not clear why you're hitting this - and the PRC-only example means it's not an ibridges bug... ?

@d-w-moore - anything you see that causes concern in the usage above (https://github.com/chStaiger/iBridges-Gui/issues/261#issuecomment-2340029484)?

chStaiger commented 1 week ago

So it sounds like...

* `iinit` defaults to `CS_NEG_DONT_CARE`, which is compatible with `CS_NEG_REQURE` on the server - and connects securely.

* `ibridges` defaults to `CS_NEG_REFUSE`, which is incompatible with `CS_NEG_REQURE` on the server - and fails with
irods.exception.NetworkException: Client-Server negotiation failure: CS_NEG_REFUSE,CS_NEG_REQUIRE

Just to verify: it is not specific to ibridges but to the PRC. I receive the same error when directly using the PRC without the ibridges wrapping:

>>> session = iRODSSession(irods_env_file="/home/cstaiger/.irods/irods_environment.json", password  = "********")
>>> session.server_version
Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 319, in server_version
    return tuple(ast.literal_eval(reported_vsn))
  File "/usr/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 0

    ^
SyntaxError: unexpected EOF while parsing

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 325, in __server_version
    conn = next(iter(self.pool.active))
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 70, in get_connection
    conn = self.idle.pop()
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 321, in server_version
    return self.__server_version()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/session.py", line 328, in __server_version
    conn = self.pool.get_connection()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 17, in method_
    ret = method(self,*s,**kw)
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/pool.py", line 87, in get_connection
    conn = Connection(self, self.account)
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/connection.py", line 62, in __init__
    self._server_version = self._connect()
  File "/home/cstaiger/.local/lib/python3.8/site-packages/irods/connection.py", line 302, in _connect
    raise NetworkException("Client-Server negotiation failure: {},{}".format(client_policy, server_policy))
irods.exception.NetworkException: Client-Server negotiation failure: CS_NEG_REFUSE,CS_NEG_REQUIRE
trel commented 1 week ago

Ah, well then, there we go. PRC apparently has a different default than iinit.

Maybe we should catch that more elegantly - but it is working as designed I think.

chStaiger commented 6 days ago

Ah, well then, there we go. PRC apparently has a different default than iinit.

Maybe we should catch that more elegantly - but it is working as designed I think.

In principle it is fine as it is. I was just mislead and thought there might be more parameters that we miss for the PRC to function as the icommands. For debugging it would be nice if they would default to the same vallues.

chStaiger commented 6 days ago

Going through the thread again, I think we can establish:

I am bit puzzled, since I do not see any problem on our iRODS instances.

@trel and @alanking Which iRODS server configurations do we need to check and test? @simstoica Can you oversee which of your server configuration might "block" the data ports or do extra checks on the bitstream when the data ports are used?

trel commented 6 days ago

You've documented reproducible cases (files triggering parallel transfer on high ports not completing due to INTERMEDIATE state replica access)...

I'm not convinced these are the same issue - but they are definitely similar.

We have not been able to reproduce this behavior, yet.

chStaiger commented 6 days ago

You've documented reproducible cases (files triggering parallel transfer on high ports not completing due to INTERMEDIATE state replica access)...

* by @jnimoth on `/rug` (iRODS 4.2.11, PRC 2.1.0, Python 3.10) in [Can't upload files above a certain size #261 (comment)](https://github.com/chStaiger/iBridges-Gui/issues/261#issuecomment-2340029484)

  * Are there additional/custom rules running at `/rug/`?

* by @chStaiger on `<unknown_zone_name>` (unknown versions) when closing an upload window too early (and maybe in the sync tab also) in iBridges-GUI in [Can't upload files above a certain size #261 (comment)](https://github.com/chStaiger/iBridges-Gui/issues/261#issuecomment-2332336752)

  * Can you test against iRODS 4.2.11 as well?

I'm not convinced these are the same issue - but they are definitely similar.

We have not been able to reproduce this behavior, yet.

Unfortunately I do not have access to an iRODS 4.2.11 nor do I have access to an infrastructure where I can setup such a server. Concerning the Can't upload files above a certain size #261 (comment) I am not so sure about it. I tested that with a 10GB file on a weak internet connection. So it might have been that I was just too impatient to wait until the transfer finished. I did not see any crash there, but also not that the file transfer finished. I will do more tests in the coming days. However, that does not solve the issue of immediately crashing. This behaviour I could not reproduce.

trel commented 6 days ago

However, that does not solve the issue of immediately crashing. This behaviour I could not reproduce.

Ah, okay - so we have only seen the immediate crash on the /rug/ zone, and only @jnimoth has seen it...

d-w-moore commented 4 days ago

I was unable to reproduce against iRODS 4.2.11 on a fresh install.