Closed rahulnair23 closed 1 month ago
Describe the bug When the target and query dataframe are not the same, the result is inconsistent in terms of indices for input elements.
To Reproduce Steps to reproduce the behavior:
import pandas as pd from hestia.similarity import calculate_similarity smiles = ['[H][C]1=[N][C]2=[C]([O][C]([H])([H])[C]3([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]3([H])[H])[N]=[C]([N]([H])[C]3=[C]([H])[C]([H])=[C]([H])[C]([Br])=[C]3[H])[N]=[C]2[N]1[H]', '[H][C]1=[N][C]2=[C]([O][C]([H])([H])[C]3([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]3([H])[H])[N]=[C]([N]([H])[C]3=[C]([H])[C]([H])=[C]([H])[C]([H])=[C]3[H])[N]=[C]2[N]1[H]', '[H]c1c(c(c(c(c1[H])Cl)[H])N([H])c2nc3c(c(n2)OC([H])([H])C4(C(C(C(C(C4([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])N=C(N3[H])[H])[H]'] query_df = pd.DataFrame({'smiles': smiles}) target_df = pd.DataFrame({'smiles': smiles[0:2]}) sim_df = calculate_similarity(query_df, target_df, data_type='small_molecule', similarity_metric='fingerprint', field_name='smiles') print(f"Max index: Query: {sim_df['query'].max()}, Target: {sim_df.target.max()}. ") `` returns
Max index: Query: 2, Target: 2.
**Expected behavior** Should return
Max index: Query: 2, Target: 1.
**Desktop (please complete the following information):** - OS: [e.g. iOS] MacOS
Describe the bug When the target and query dataframe are not the same, the result is inconsistent in terms of indices for input elements.
To Reproduce Steps to reproduce the behavior:
Max index: Query: 2, Target: 2.
Max index: Query: 2, Target: 1.