Executing main.py successfully queries Pushshift, but eventually fails with the error "ValueError: /src/../models/lid.176.bin has wrong file format!". This happens in a local environment, or in a locally built container. Please attempt to build and execute this container from a clean clone of the repo!
Run log with stack trace as follows:
INFO:root:done
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 46 concurrent workers.
[Parallel(n_jobs=-1)]: Done 8 out of 24 | elapsed: 2.7s remaining: 5.3s
[Parallel(n_jobs=-1)]: Done 24 out of 24 | elapsed: 4.8s finished
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.
Traceback (most recent call last):
File "//src/main.py", line 33, in <module>
main()
File "//src/main.py", line 13, in main
articles_df = data.get_articles_df(start_date, end_date)
File "/src/data/data.py", line 28, in get_articles_df
df = articles.filter_invalid_articles(df)
File "/src/data/articles.py", line 143, in filter_invalid_articles
df = filter_articles_by_lang(df)
File "/src/data/articles.py", line 108, in filter_articles_by_lang
lang_model = fasttext.load_model(os.path.join(config.model_path, "lid.176.bin"))
File "/usr/local/lib/python3.9/site-packages/fasttext/FastText.py", line 441, in load_model
return _FastText(model_path=path)
File "/usr/local/lib/python3.9/site-packages/fasttext/FastText.py", line 98, in __init__
self.f.loadModel(model_path)
ValueError: /src/../models/lid.176.bin has wrong file format!
The problem seems to be due to git-lfs not being installed, thank you for catching that.
An installation section has been added to the README file with more detailed instructions on how to run the code.
Executing main.py successfully queries Pushshift, but eventually fails with the error "ValueError: /src/../models/lid.176.bin has wrong file format!". This happens in a local environment, or in a locally built container. Please attempt to build and execute this container from a clean clone of the repo!
Run log with stack trace as follows: