Closed shainaraza closed 2 years ago
Hi @shainaraza , in the colab notebook that you provide, I don't see any line that handles writing documents into the document store. Our recommendation is that you index your documents using Haystack via the document_store.write_documents(docs)
method. If you have an existing Elasticsearch Database that you would like to use with Haystack, you will have to ensure that the fields in ES are named in a specific way
Hello @shainaraza, did you find a solution to your problem in the end? If so, please let us know :slightly_smiling_face:
Yes @ZanSara I found, will update this thread.
Colab was blocking the API address so I used ngrok to have a public address from colab, below he code (its little mixed, apologies for that but it worked) file.txt
!pip install flask-ngrok
from pyngrok import ngrok
ngrok_process = ngrok.get_ngrok_process()
try:
** Block until CTRL-C or some other terminating event
ngrok_process.proc.wait()
except KeyboardInterrupt:
print(" Shutting down server.")
ngrok.kill()
from flask_ngrok import run_with_ngrok
from flask import Flask, request
from fastapi import FastAPI
from haystack.reader.farm import FARMReader
from haystack.pipeline import ExtractiveQAPipeline
from haystack.document_stores import InMemoryDocumentStore
DOC_STORE = InMemoryDocumentStore()
doc_dir = "data/article_txt_got"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)
print(dicts[:3])
DOC_STORE.write_documents(dicts)
from haystack.nodes import TfidfRetriever
RETRIEVER = TfidfRetriever(DOC_STORE)
READER = FARMReader(model_name_or_path='deepset/bert-base-cased-squad2',
context_window_size=1500,
use_gpu=True)
** initialize pipeline
PIPELINE = ExtractiveQAPipeline(reader=READER, retriever=RETRIEVER)
** initialize API
app = Flask(__name__)
run_with_ngrok(app) **starts ngrok when the app is run
@app.route('/')
def get_query():
"""Makes query to doc store via Haystack pipeline.
:param q: Query string representing the question being asked.
:type q: str
"""
q = "covid-19?"
** get answers
return PIPELINE.run(query=q, params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})
app.run( )
Thank you very much! I'll close this thread now, but this solution will be a good reference for the future :slightly_smiling_face:
Question Put your question here How to use FastAPI, haystack with Colab Additional context Add any other context or screenshots about the question (optional).
FAQ Check
I have this piece of code, and I am unable to have run haystack on Colab. There is no syntax error but fastAPI does not pick the data from pipeline. Any advise?
`!pip install fastapi nest-asyncio pyngrok uvicorn !pip install git+https://github.com/deepset-ai/haystack.git
In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q ! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz ! chown -R daemon:daemon elasticsearch-7.9.2
import os from subprocess import Popen, PIPE, STDOUT es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1) # as daemon )
wait until ES has started
! sleep 15
from fastapi import FastAPI from haystack.document_store.elasticsearch import ElasticsearchDocumentStore from haystack.document_stores import ElasticsearchDocumentStore
from haystack.retriever.sparse import ElasticsearchRetriever from haystack.reader.farm import FARMReader from haystack.pipeline import ExtractiveQAPipeline
initialize doc store, retriever and reader components
DOC_STORE = ElasticsearchDocumentStore( host='localhost', username='', password='', index='aurelius' ) RETRIEVER = ElasticsearchRetriever(DOC_STORE) READER = FARMReader(model_name_or_path='deepset/bert-base-cased-squad2', context_window_size=1500, use_gpu=True)
initialize pipeline
PIPELINE = ExtractiveQAPipeline(reader=READER, retriever=RETRIEVER)
initialize API
APP = FastAPI()
@APP.get('/query') async def get_query(q: str, retriever_limit: int = 10, reader_limit: int = 3): """Makes query to doc store via Haystack pipeline.
from pyngrok import ngrok
Terminate open tunnels if exist
ngrok.kill()
Setting the authtoken (optional)
Get your authtoken from https://dashboard.ngrok.com/auth
ngrok.set_auth_token(NGROK_AUTH_TOKEN)
ngrok_tunnel = ngrok.connect(9200) print('Public URL:', ngrok_tunnel.public_url) nest_asyncio.apply() uvicorn.run(APP ) `
Link to Colab notebook https://colab.research.google.com/drive/191cyC5eXajgekBwJKs4hKmAiC_WmHuQ_?usp=sharing