atrisovic / paper_analysis_toolkit

1 stars 0 forks source link

Affiliations extraction #3

Open atrisovic opened 2 months ago

atrisovic commented 2 months ago
  1%|          | 408/43395 [1:24:39<106:40:46,  8.93s/it]
  1%|          | 409/43395 [1:24:45<96:20:55,  8.07s/it] 
  1%|          | 410/43395 [1:24:47<74:22:36,  6.23s/it]
  1%|          | 411/43395 [1:24:54<78:46:58,  6.60s/it]
  1%|          | 412/43395 [1:24:59<73:15:08,  6.14s/it]
  1%|          | 413/43395 [1:25:01<58:08:39,  4.87s/it]
  1%|          | 413/43395 [1:25:13<147:50:08, 12.38s/it]
Traceback (most recent call last):
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/affiliations_main.py", line 60, in <module>
    main()
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/affiliations_main.py", line 55, in main
    corpus.setAllAffiliations(classifier = aff_classifier, resultsfile = resultsfile)
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/documents/Corpus.py", line 107, in setAllAffiliations
    results = paper.findNamesAndAffiliations(classifier=classifier)
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/documents/Paper.py", line 119, in findNamesAndAffiliations
    self.name_and_affiliation = classifier.classifyFromTextEnsureJSON(pre_abstract)
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/affiliations/AffiliationClassifier.py", line 104, in classifyFromTextEnsureJSON
    json_string_results = self.classifyFromText(text)
  File "/home/gridsan/atrisovic/examples/osfm/paper_analysis_toolkit/affiliations/AffiliationClassifier.py", line 87, in classifyFromText
    generated_ids = self.model.generate(input,
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1157, in forward
    outputs = self.model(
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1042, in forward
    layer_outputs = decoder_layer(
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 757, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 673, in forward
    value_states = repeat_kv(value_states, self.num_key_value_groups)
  File "/home/gridsan/atrisovic/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 192, in repeat_kv
    return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 70.00 MiB. GPU 0 has a total capacity of 31.74 GiB of which 36.88 MiB is free. Including non-PyTorch memory, this process has 31.70 GiB memory in use. Of the allocated memory 30.35 GiB is allocated by PyTorch, and 426.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Results

{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/d9328c6253db7b8e47a8543116cc697c85d27578.mmd": {"institutions": [], "countries": [], "contributors": []}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/24b5e8ecde633f5fd71cb1facdd80571f8b2ed7f.mmd": {"institutions": ["USC Information Sciences Institute"], "countries": ["United States", "Egypt"], "contributors": [{"first": "Mohamed E.", "last": "Hussein", "gender": "male"}, {"first": "Sudharshan", "last": "Subramaniam Janakiraman", "gender": "male"}, {"first": "Wael", "last": "Abdalmageed", "gender": "male"}]}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/04b2b364dfe51940c0ba971b5df83759b659f925.mmd": {"institutions": ["Carnegie Mellon University"], "countries": ["United States"], "contributors": [{"first": "Jonathan L.", "last": "Elsas", "gender": "male"}, {"first": "Vitor R.", "last": "Carvalho", "gender": "male"}, {"first": "Jaime G.", "last": "Carbonell", "gender": "male"}]}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/34afd7b7a834359079d7fc60ae6809e0fd57a150.mmd": {"institutions": [], "countries": [], "contributors": []}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/24439e35e17be8ace5282093f2c13707e911d5fa.mmd": {"institutions": ["ETH Zurich", "Lund University"], "countries": ["Switzerland", "Sweden"], "contributors": [{"first": "Luca", "last": "Cavalli", "gender": "male"}, {"first": "Daniel", "last": "Barath", "gender": "male"}, {"first": "Marc", "last": "Pollefeys", "gender": "male"}, {"first": "Viktor", "last": "Larsson", "gender": "male"}]}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/c9ecbfa2c9e1453bd9a433430c18e8748de9b95c.mmd": {"institutions": ["University of Patras"], "countries": ["Greece"], "contributors": [{"first": "Charalampos M.", "last": "Liapis", "gender": "male"}, {"first": "Sotiris", "last": "Kotsiantis", "gender": "male"}]}}
{"/home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/76bc61e8bb5dc2443773ac7a7937db93b130ca88.mmd": {"institutions": [], "countries": [], "contributors": []}}

Logs

INFO:documents.Corpus:Discovering all files in directory /home/gridsan/atrisovic/futuretech_shared/atrisovic/osfm/markdown/ with extensions in ['mmd'].
INFO:documents.Corpus:Found 48704 files (filter_path set to ../open_access_paper_ids.csv).
INFO:documents.Corpus:Loading 48704 files as Paper objects.
INFO:documents.Corpus:Finished loading papers for corpus. 10.9% of papers threw an error. See Corpus.bad_papers.