aimhubio / aim

Aim ๐Ÿ’ซ โ€” An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
5.23k stars 322 forks source link

Error in run.add_tag (UNIQUE constraint violated) #3216

Open anigmetov opened 2 months ago

anigmetov commented 2 months ago

๐Ÿ› Bug

When I start multiple experiments in parallel with the same run, I get the UNIQUE constraint violation error.

To reproduce

It's hard to reproduce with a short script, I tried that:

$ cat ./run.sh 
#!/bin/bash

VENV_PATH="venv"

generate_commands() {
  for iter in 1 2 3 4 5 6 7 8 9 10 11 12
  do
      echo "source $VENV_PATH/bin/activate && python tag_bug.py "
  done
}

export -f generate_commands
generate_commands | parallel -j 8
$ cat tag_bug.py
import aim

config = {
        "key1" : "value1",
        "key2" : 2,
        }

run = aim.Run(capture_terminal_logs=True, log_system_params=True, experiment="check tag")
run["hparams"] = config
run.add_tag("1D")
run.add_tag("test")
$ ./run.sh

but this actually worked. Sorry, I cannot share the real code where I am getting this error. My only guess is that in real code one of the processes succeeds in inserting the tag, but all the other processes have checked for the existence of the tag before and are trying to insert it as well.

Expected behavior

Should run without an error.

Environment

Additional context

I've seen this multiple times on different machines by now. Stack trace:

Traceback (most recent call last):                                                                                                                                                                        
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context                                            
    self.dialect.do_execute(                                                                                                                                                                              
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute                                                    
    cursor.execute(statement, parameters)                                                                                                                                                                 
sqlite3.IntegrityError: UNIQUE constraint failed: tag.name                                                                                                                                                

The above exception was the direct cause of the following exception:                                                                                                                                      

Traceback (most recent call last):                                                                                                                                                                        
  File "/home/narn/code/foundation_models/neuraloperators-icl/1D/train_wave_eqn.py", line 676, in <module>                                                                                                
    train_wave_eqn(append_avg_solutions=args.precondition,                                                                                                                                                
  File "/home/narn/code/foundation_models/neuraloperators-icl/1D/train_wave_eqn.py", line 549, in train_wave_eqn                                                                                          
    run.add_tag("1D")                                                                                                                                                                                     
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/aim/sdk/run.py", line 246, in add_tag                                                                     
    return self.props.add_tag(value)                                                                                                                                                                      
           ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/aim/storage/structured/sql_engine/entities.py", line 226, in add_tag                                      
    session_commit_or_flush(session)                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/aim/storage/structured/sql_engine/entities.py", line 31, in session_commit_or_flush                       
    session.commit()                                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2028, in commit                                                          
    trans.commit(_to_root=True)                                                                                                                                                                           
  File "<string>", line 2, in commit                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go                                                        
    ret_value = fn(self, *arg, **kw)                                                                                                                                                                      
                ^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1313, in commit                                                          
    self._prepare_impl()                                                                                                                                                                                  
  File "<string>", line 2, in _prepare_impl                                                                                                                                                               
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go                                                        
    ret_value = fn(self, *arg, **kw)                                                                                                                                                                      
                ^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                      
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1288, in _prepare_impl                                                   
    self.session.flush()                                                                                                                                                                                  
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4352, in flush                                                           
    self._flush(objects)                                                                                                                                                                                  
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4487, in _flush                                                          
    with util.safe_reraise():
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4448, in _flush
    flush_context.execute()
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1233, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/home/narn/code/foundation_models/neuraloperators-icl/venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: tag.name
[SQL: INSERT INTO tag (uuid, name, color, description, is_archived, created_at, updated_at) VALUES (?, ?, ?, ?, ?, ?, ?)]
[parameters: ('a12cff8c-6e62-4910-992b-2d5967a196f3', '1D', None, None, 0, '2024-09-08 21:50:12.501084', '2024-09-08 21:50:12.501112')]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
mfouesneau commented 3 days ago

Same issue randomly starting after many successful runs with run_tag.run_id, run_tag.tag_id