Open lauratwomey opened 1 week ago
Update! I figured out I was getting the issue above when removing the "Bender et al lines" from the id_to_study.txt file. When I use the original id_to_study.txt file from the 2023 OAS-aligned (63GB), kasearch runs but outputs an empty dataframe (see below). There are only 8 lines with the Identity values, the rest are empty. I am unsure whether this is because of the Bender et al being removed from OAS, or if I am not using EasySearch correctly - any help would be greatly appreciated!
Could you let me know how to get the latest pre-aligned version of OAS?
I am running the command from the issue above:
Analysis starting at: 2024-07-05 14:57:16.627652
Running Easy Search...................................................
Heavy chain data in Bender et al. 2020 has been removed from OAS due to contamination.
Heavy chain data in Bender et al. 2020 has been removed from OAS due to contamination.
Heavy chain data in Bender et al. 2020 has been removed from OAS due to contamination.
Heavy chain data in Bender et al. 2020 has been removed from OAS due to contamination.
Finished Easy Search...................................................
Saving results...................................................
Unnamed: 0 sequence locus ... Total sequences Isotype Identity
0 NaN NaN NaN ... NaN NaN 0.899160
1 NaN NaN NaN ... NaN NaN 0.899160
2 NaN NaN NaN ... NaN NaN 0.892562
3 NaN NaN NaN ... NaN NaN 0.892562
4 NaN NaN NaN ... NaN NaN 0.890756
[5 rows x 114 columns]
Analysis finished at: 2024-07-05 15:30:56.135003
Hi Laura, thank you for using KA-Search and highlighting this issue!
Some time ago we decided to remove parts of the Bender 2020 study from OAS because we suspect some of the human sequences contain mouse sequences. However, because this would break the public pre-processed OAS for KA-Search, we updated the kasearch code to highlight when user queries would match with Bender 2020 sequences. This results in results without meta data, as the meta data is not in OAS any more. Unfortunately, we left a sequence which matches with Bender 2020 sequences as the example sequence, this has now been changed (#10).
For convenience, you can create your own pre-aligned version of OAS using the prepareOASdb.ipynb notebook. This will take some time or resources (~1 day on 20 CPUs), but you will then have an up-to-date pre-aligned version of OAS.
I hope this helps, otherwise please let me know if you have any other issues.
Dear kasearch team,
First of all, thanks for all your work, kasearch is really promising!! I'm really hoping I can get it running soon.
I'm trying to run EasySearch on the sample sequence. I downloaded the publication dataset into this folder: /researchers/laura.twomey/Tools/omics_tools/kasearch/oasdb_20230111/
But get this error:
I'm using: