Merck / BioPhi

BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design.
https://biophi.dichlab.org/
MIT License
131 stars 44 forks source link

Compatibility with SQLAlchemy 1.4.48 #38

Closed JPereira-FJB closed 1 year ago

JPereira-FJB commented 1 year ago

I'm not sure if what I'm going to showcase here is 100% correct, but here it goes. While attempting to build and run my local version of the BioPhi backend, I run into some weird errors, including sqlalchemy.exc.ArgumentError: List argument must consist only of tuples or dictionaries. In particular, the error directed me to the get_oas_hits function (bellow).

def get_oas_hits(peptides: Union[str, List[str]], engine: Engine, filter_chain=None):
    if not isinstance(peptides, tuple):
        peptides = tuple(peptides)
    filter_chain_statement = ""
    if filter_chain:
        assert filter_chain in ['Heavy', 'Light']
        filter_chain_statement = f"AND Complete{filter_chain}Seqs >= 10000"

    statement = "SELECT peptides.* FROM peptides " \
                "LEFT JOIN subjects ON peptides.subject=subjects.id " \
                "WHERE peptide IN (" + ",".join("?" * len(peptides)) + ") AND subjects.StudyPath <> 'Corcoran_2016' " \
                + filter_chain_statement
    return pd.read_sql(statement, params=peptides, con=engine)

According to this Stack Overflow post the problem seems to derive from changes in the source code of SQLAlchemy itself. In the environment.yml I see sqlalchemy < 2, I seem to be running SQLAlchemy 1.4.48 on my docker container. Nonetheless, I was having this error and I managed to fix it by changing the get_oas_hits function to:

def get_oas_hits(peptides: Union[str, List[str]], engine: Engine, filter_chain=None):
    if not isinstance(peptides, list):
        peptides = list(peptides)
    filter_chain_statement = ""
    if filter_chain:
        assert filter_chain in ['Heavy', 'Light']
        filter_chain_statement = f"AND Complete{filter_chain}Seqs >= 10000"

    params = {f'param_{i}': peptide for i, peptide in enumerate(peptides)}
    statement = "SELECT peptides.* FROM peptides " \
                "LEFT JOIN subjects ON peptides.subject=subjects.id " \
                f"WHERE peptide IN ({','.join(':' + key for key in params.keys())}) AND subjects.StudyPath <> 'Corcoran_2016' " \
                + filter_chain_statement
    return pd.read_sql(statement, params=params, con=engine)

Not sure if this is something you'd like to take a look at.

prihoda commented 1 year ago

@JPereira-FJB thanks for reporting, I fixed this recently in https://github.com/Merck/BioPhi/commit/6e3305c13fc419bfb18d9b94a485053f20ac4a83, it's merged to main branch and also available on bioconda as biophi=1.0.9