bigscience-workshop / metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
Apache License 2.0
30 stars 12 forks source link

Remove entity description #126

Closed manandey closed 2 years ago

manandey commented 2 years ago

Hi @SaulLu,

As discussed, I have removed the entity description field from the main entity preprocessor in this PR. I will create a separate PR having a separate preprocessor for entity description shortly.

SaulLu commented 2 years ago

Thank for working on those changes!

As is, I think that some tests will failed in test_preprocessing_utils.py. Have you run the tests localy? In particular, did you run pytest tests/test_preprocessing_utils.py?