TobiasHeOl / kasearch

KA-Search: Rapid and exhaustive sequence identity search of known antibodies
BSD 3-Clause "New" or "Revised" License
10 stars 9 forks source link

Create custom database example return KeyError -1 when doing EasySearch on new DB #5

Closed annadiarov closed 12 months ago

annadiarov commented 12 months ago

Hi! I've run your example notebooks both locally and on GoogleColab, and I always get a KeyError: -1 when doing the EasySearch on the new database. I think the error may be due to how the database is generated because I'm able to run EasySearch when using oas-aligned-tiny.

Here is the traceback error that I get on GoogleColab:

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

[<ipython-input-60-5ac3149581ac>](https://localhost:8080/#) in <cell line: 5>()
      3 query = 'QVQLQQSGAELARPGASVKLSCKASGYTFTSYWMQWVKQRPGQGLEWIGAIYPGDGDTRYTQKFKGKATLTADKSSSTAYMQLSSLASEDSAVYYCARGGLRRGAWFAYWGQGTLVTVS'
      4 
----> 5 results = EasySearch(query, 
      6                keep_best_n=10,
      7                database_path=path_to_save_new_db,

5 frames

[/usr/local/lib/python3.10/site-packages/kasearch/easy_search.py](https://localhost:8080/#) in EasySearch(query, keep_best_n, database_path, allowed_chain, allowed_species, regions, length_matched, include_ends, local_oas_path, n_jobs)
     54     targetdb.search(querydb[:1], keep_best_n=keep_best_n)
     55 
---> 56     return targetdb.get_meta(n_query=0, n_region=0, n_sequences='all', n_jobs=n_jobs)

[/usr/local/lib/python3.10/site-packages/kasearch/kasearch.py](https://localhost:8080/#) in get_meta(self, n_query, n_region, n_sequences, n_jobs)
    147         assert n_sequences > 0
    148 
--> 149         metadf = self._extract_meta(self.current_best_ids[n_query, :n_sequences, n_region], n_jobs=n_jobs)
    150         metadf['Identity'] = self.current_best_identities[n_query, :n_sequences, n_region]
    151         return metadf

[/usr/local/lib/python3.10/site-packages/kasearch/meta_extract.py](https://localhost:8080/#) in _extract_meta(self, idxs, n_jobs)
     77         n_jobs = n_groups if n_groups <  n_jobs else n_jobs
     78         chunksize= n_groups // n_jobs
---> 79 
     80         fetched_metadata = pd.concat(Parallel(n_jobs=n_jobs)(delayed(self._get_single_study_meta)(group) for group in groups))
     81 

[/usr/local/lib/python3.10/site-packages/joblib/parallel.py](https://localhost:8080/#) in __call__(self, iterable)
   1853             output = self._get_sequential_output(iterable)
   1854             next(output)
-> 1855             return output if self.return_generator else list(output)
   1856 
   1857         # Let's create an ID that uniquely identifies the current call. If the

[/usr/local/lib/python3.10/site-packages/joblib/parallel.py](https://localhost:8080/#) in _get_sequential_output(self, iterable)
   1782                 self.n_dispatched_batches += 1
   1783                 self.n_dispatched_tasks += 1
-> 1784                 res = func(*args, **kwargs)
   1785                 self.n_completed_tasks += 1
   1786                 self.print_progress()

[/usr/local/lib/python3.10/site-packages/kasearch/meta_extract.py](https://localhost:8080/#) in _get_single_study_meta(self, idxs)
     44         """  
     45         print('hi')
---> 46         study_id, line_ids = idxs[0,0], idxs[:,1]
     47         study_file = self.id_to_study[study_id]
     48 

KeyError: -1

Thanks!

TobiasHeOl commented 12 months ago

Hi Anna, thank you for bringing this to our attention!

This issue should now be fixed. Please re-install and try again, and let me know if you continue to have issues :)

All the best, Tobias