csensemakers / desci-sense

2 stars 2 forks source link

Something is up with the base parser #75

Closed ShaRefOh closed 9 months ago

ShaRefOh commented 9 months ago

@ronentk I pulled before pushing the if name=="main" and there seems to be a change in the base_parser that dissagrees with the utils.py module that raise an error

Traceback (most recent call last): File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/evaluation/eval_benchmark_v0.py", line 140, in pred_labels(df) File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/evaluation/eval_benchmark_v0.py", line 52, in pred_labels response = model.process_text({'text':df['Text'][i]}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/parsers/base_parser.py", line 89, in process_text post: RefPost = convert_text_to_ref_post(text, author, source) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/dataloaders/init.py", line 12, in convert_text_to_ref_post urls = extract_and_expand_urls(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/utils.py", line 48, in extract_and_expand_urls expanded_urls = [normalize_url(url) for url in extract_urls(text)] ^^^^^^^^^^^^^^^^^^ File "/Users/shaharorielkagan/Documents/Python/desci-sense/desci_sense/utils.py", line 13, in extract_urls res = re.findall(url_regex, text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/init.py", line 216, in findall return _compile(pattern, flags).findall(string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or bytes-like object, got 'dict'

ShaRefOh commented 9 months ago

Ok, I reverted commit a658993 locally, and indeed, it seems that some modifications changed the way the base parser behaves. Specifically, the process_text method of the model class is changed by that.

ronentk commented 9 months ago

Right, I added some arguments, try

response = model.process_text(df['Text'][i])

instead of

response = model.process_text({'text':df['Text'][i]})
ShaRefOh commented 9 months ago

Done