Allegra-Cohen / grid

GNU General Public License v3.0
1 stars 3 forks source link

corpus loading error #67

Open maxaalexeeva opened 1 year ago

maxaalexeeva commented 1 year ago

This error occurs when trying to create a new corpus to use in the grid. Steps that lead to the error:

Step 1: I created a set of files containing sentences to go into the grid in the following format:

File 1: file name: grid_row_label1.txt file content: Sent 1. Sent 2. Sent 3.

File 2: file name: grid_row_label2.txt file content: Sent 4. Sent 5. Sent 6.

File 3: ...

Only sentences that go in the grid are in these files, no context sentences. I process the file using the upload or update corpus feature of the grid.

Step 2: To get context sentences into the grid, I created a separate corpus file, with one sentence per line and called it corpus.csv. The sentences in this file are the sentences from step one and their context (in my case, it was two sentences before the current grid sentence, the current grid sentence, and then two sentences after the current grid sentence). I don't think this is an ideal way to handle it, but overall it worked. I added the corpus.csv to the /grid/process_files directory because that is where the corpus is supposed to be to use the Create New Grid function. After clicking Create New Grid!, entering corpus.csv in the which corpus will you use? field and the "*_row_labels.csv" file (produced in step 1) in the which row labels will you use? field, I get the following error (a not ideal way to solve it is after the error):

loadNewGrid  load_all
New grid -- processing documents ... 
INFO:     None:0 - "GET /loadNewGrid/?corpusFilename=corpus.csv&rowFilename=galamsey_2500_for_grid_row_labels.csv&newFilename=&newAnchor=load_all HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./main2.py", line 280, in loadNewGrid
    return frontend.load_new_grid(newFilename, newAnchor)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./main2.py", line 85, in load_new_grid
    self.grid = self.backend.get_grid(k, newAnchor, newFilename, self.clustering_algorithm)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./backend/backend.py", line 74, in get_grid
    self.set_up_corpus(anchor)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./backend/backend.py", line 102, in set_up_corpus
    self.corpus = Corpus(self.path, self.clean_supercorpus_filename, self.row_labels_filename, self.rows, anchor, self.linguist)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./backend/corpus.py", line 29, in __init__
    self.documents: list[Document] = self.load_anchored_documents(anchor == 'load_all')
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./backend/corpus.py", line 91, in load_anchored_documents
    lines = self.load_corpus_lines(self.path, self.clean_supercorpus_filename)
  File "/home/maxaalexeeva/Desktop/Repos/grid/habitus_ui_interface-main/./backend/corpus.py", line 129, in load_corpus_lines
    lines = pd.read_csv(path + filename + '.csv', header = 0)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
    self.handles = get_handle(
  File "/home/maxaalexeeva/miniconda3/envs/grid/lib/python3.9/site-packages/pandas/io/common.py", line 856, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '../process_files/cleaned_corpus.csv'

This is solved by copying the cleaned_corpus extensionless file produced in step 2 and renaming it to include the .csv extension. The grid loads fine (given it's not too large) with the create a new grid feature if both the files are present in the process_files directory, but there has to be a better way to handle this.

Step 1 is probably intended to create the correct corpus files; however, I do not know how to format input files in step 1 to avoid loading context sentences in the grid along with target sentences.

maxaalexeeva commented 1 year ago

@Allegra-Cohen tagging you on an fyi basis in case you know a better way to create a new corpus from scratch.