Neural search through protein sequences using the ProtBert model and the Jina AI framework.
App demo:
First, clone the repository with git
,
git clone https://github.com/georgeamccarthy/protein_search/ # Cloning
cd protein_search # Changing directory
If you're familiar with Docker
, you can simply run make docker
(assuming you're running Linux).
The above command will,
frontend
, installs dependencies, starts the Streamlit
applicationbackend
, installs dependencies, starts the Jina
applicationVisually, you should see something like,
From there on, you should be able to visit the Streamlit frontend, and enter your protein relatd query.
Some notes before you use this route,
Docker
takes a few moments to build the wheel for the dependencies, so the pip
step in each of the containers my last as long as 1-2 minutes.torch
dependency in backend/requirements.txt
is 831.1 MBs large at the time of writing. Unless you get red colored logs, everything is fine and just taking time to be installed for torch
Rostbert/prot_bert
pre-trained model from HuggingFace
which is 1.68 GBs in size.The great news is that you will need to install these dependencies and build the images only once. Docker will cache all of the layers and steps, and caching for the pre-trained model has been integrated.
Some more functionalites provided are,
docker
, press Ctrl^C
make up
make remove
make docker
As for introducing new changes, both the containers do not need to be restarted to do so.
For each of the folders frontend
, and backend
, run the following commands
venv
virtual environment,cd folder_to_go_into/ # `folder_to_go_into` is either `frontend` or `backend`
python3 -m venv env
source venv/bin/activate
pip install -r requirements.txt
If in backend
, run python3 src/app.py
Open a new terminal, head back into the frontend
folder, repeat venv
creation and dependency
installation, and run streamlit run app.py
.
Refer to the Makefile
for the specific commands
To format code following the black
standard
$ make format
Code linting with flake8
$ make lint
Testing
$ make test
Testing with coverage analysis
$ make coverage
Format, test and coverage
$ make build