airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Importing `airr` takes far longer than it should. #682

Closed jday1 closed 1 year ago

jday1 commented 1 year ago

TLDR: airr takes a long time to import. Fix is here https://github.com/airr-community/airr-standards/pull/683

Importing airr takes 6.9 seconds whereas it should takes tenths of a second.

(airr_standards) [master][~/Documents/new/airr-standards.airr/lang/python]$ python
Python 3.10.4 (main, May 30 2022, 12:51:07) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> def time_import_airr():
...     start_time = time.time()
...     import airr
...     end_time = time.time()
...     print(f"Import took {end_time - start_time} seconds")
... 
>>> time_import_airr()
Import took 6.869977951049805 seconds

This slows down applications using it and discourages the airr library from being adopted.

This occurs because of how the schema is instantiated in schema.py. The Schema class is instantiated many times in the AIRRSchema dict. As part of this instantiated, the same airr-schema.yaml file is loaded multiple times:

else:
    with resource_stream(__name__, 'specs/airr-schema.yaml') as f:
        spec = yaml.load(f, Loader=yamlordereddictloader.Loader)

Instead, this should be done once in the file and then called during class instantiation.

With this logic, the import time for airr now takes 0.4 seconds.

(airr_standards) *[master][~/Documents/new/airr-standards.airr/lang/python]$ python                       
Python 3.10.4 (main, May 30 2022, 12:51:07) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> def time_import_airr():
...     start_time = time.time()
...     import airr
...     end_time = time.time()
...     print(f"Import took {end_time - start_time} seconds")
... 
>>> time_import_airr()
Import took 0.4411776065826416 seconds

I implemented this logic in PR SyntenyBio:jday1/682-fix https://github.com/airr-community/airr-standards/pull/683

schristley commented 1 year ago

Excellent, thanks! I've been wanting to look into that myself, I'm glad you did.

jday1 commented 1 year ago

Resolved by https://github.com/airr-community/airr-standards/pull/683/files