[!Important]
The following test CANNOT BE DONE if you are not an authorized member of cvpaper challenge who does not have access to the Amazon Secret Manager of our organization.
[!Important]
Different from #33, you need to prepare your own Qdrant cloud cluster to test the paper parse & upload logics.
Create your own environments/.env referring environments/.env.sample and specify the secret environment variables
You might be able to get AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY by asking for them to @gatheluck who manages the AWS account I used for the Crux application
You can get AWS_DEFAULT_REGION and DYNAMODB_TABLE_NAME from here
# Move to the directory that has `docker-compose.yaml`
~/Crux$ cd environments/cpu
# Remove the existing docker containers
~/Crux/environments/cpu$ docker compose down
[OPTIONAL] Boot up containers without using cache
# Re-build docker images without using cache
~/Crux/environments/cpu$ docker compose build --no-cache
# Boot up docker containers
~/Crux/environments/cpu$ docker compose up -d
Run the parse & upload script
~/crux-backend$ poetry run python src/scripts/upload_paper_data.py -p data/papers
Note for reviewers
[!Note]
Parsing would end in a minute, but it takes a few hours to upload the embedding vector to the Qdrant Cloud.
[!Caution]
When you run the test, you will be charged for the OpenAI API's embedding model.
Issue URL
N/A
Change overview
update_paper_data.py
implemetationHow to test
1. Prerequisite
Create your own
environments/.env
referringenvironments/.env.sample
and specify the secret environment variablesAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
by asking for them to @gatheluck who manages the AWS account I used for the Crux applicationAWS_DEFAULT_REGION
andDYNAMODB_TABLE_NAME
from hereDownload the Mathpix format paper data that has been reviewed and corrected for any errors from here, and place them as below.
2. Local Test
[OPTIONAL] Remove the existing containers
[OPTIONAL] Boot up containers without using cache
Run the parse & upload script
Note for reviewers