Closed lipsa7 closed 1 year ago
Hello @lipsa7!
The library langdetect
is necessary to determine the language of the documents.
You can install it in two alternative ways:
pip install farm-haystack[preprocessing]
orpip install langdetect
Does this solve your problem?
Hi, thanks for your answer. I installed langdetect but that didn't solve it. I read somewhere that the issue is with colab, so I switched to vscode. Facing different issues now :D
Hi, I am also facing the same issue. Few days ago it was working fine but recently I tried to create a venv and the issue started. After installing langdetect it's showing to install docx then azure and so on. I am using vscode and flask server
https://github.com/deepset-ai/haystack/discussions/4930#discussioncomment-5928273
I have posted the code I used n this works. You can check this.
I have posted the code I used n this works. You can check this.
thanks for your reply. I tried pip install farm-haystack
. then started a server using flask but it crashes saying langdetect isn't installed.
here is my code
from flask import Flask, request, jsonify, send_file
from flask_cors import CORS
from haystack.utils import convert_files_to_docs
from haystack.nodes import PreProcessor, BM25Retriever, FARMReader
from multiprocessing import freeze_support
from haystack.document_stores import InMemoryDocumentStore
from haystack import Pipeline
from haystack.pipelines import ExtractiveQAPipeline
from haystack.nodes import SentenceTransformersRanker
import pickle
# from pdf2image import convert_from_path
import os
app = Flask(__name__)
CORS(app)
@app.route("/trainModel")
def trainModel():
freeze_support()
//codes
# process docs
processed_docs = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
split_by="sentence",
split_length=5,
add_page_number=True,
split_respect_sentence_boundary=False, # NotImplementedError: 'split_respect_sentence_boundary=True' is only compatible with split_by='word'.
).process(all_docs)
@app.route("/ref")
def send_ref():
//codes
if __name__ == "__main__":
app.run(debug=True, port=8080)
in terminal
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:8080
Press CTRL+C to quit
* Restarting with stat
* Debugger is active!
* Debugger PIN: 493-912-585
Traceback (most recent call last):
File "D:\Works\OfficialProjects\DocsML\server\app.py", line 147, in <module>
app.run(debug=True, port=8080)
File "D:\Installed\Py 3.10\lib\site-packages\flask\app.py", line 889, in run
run_simple(t.cast(str, host), port, self, **options)
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\serving.py", line 1097, in run_simple
run_with_reloader(
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 452, in run_with_reloader
with reloader:
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 292, in __enter__
return super().__enter__()
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 243, in __enter__
self.run_step()
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 295, in run_step
for name in _find_stat_paths(self.extra_files, self.exclude_patterns):
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 114, in _find_stat_paths
paths.update(_iter_module_paths())
File "D:\Installed\Py 3.10\lib\site-packages\werkzeug\_reloader.py", line 46, in _iter_module_paths
if name is None or name.startswith(_ignore_always):
File "D:\Installed\Py 3.10\lib\site-packages\generalimport\fake_module.py", line 19, in error_func
raise MissingOptionalDependency(f"Optional dependency {name} was used but it isn't installed.")
generalimport.exception.MissingOptionalDependency: Optional dependency 'langdetect' was used but it isn't installed.
@s-m-arafat have you tried these solutions?
https://github.com/deepset-ai/haystack/issues/4911#issuecomment-1546918282
I'm stuck in pre-processing stage. Can someone please help?
from haystack.nodes import TextConverter, PDFToTextConverter, DocxToTextConverter, PreProcessor converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["en"]) doc_pdf = converter.convert(file_path="/content/SPI_Electrification_15.pdf", meta=None)[0]
This is the error I'm getting: MissingOptionalDependency: Optional dependency 'langdetect' was used but it isn't installed.