Closed psorianom closed 3 years ago
Here https://github.com/etalab-ia/piaf-ml/blob/c4f8457c8b8e0be7c6daed46ddfca0a058e57c39/src/util/convert_json_files_to_dicts.py#L26 we are no longer cleaning the text. Shoul we add at least the same as haystacks (wiki_clean_text) ?
wiki_clean_text
we can, however it mainly adds / removes lines breaks. I don't know the impact it may have, any opinion on that ?
indeed. Not really sure, but I believe its better to have it rather than not. I will add it!
Here https://github.com/etalab-ia/piaf-ml/blob/c4f8457c8b8e0be7c6daed46ddfca0a058e57c39/src/util/convert_json_files_to_dicts.py#L26 we are no longer cleaning the text. Shoul we add at least the same as haystacks (
wiki_clean_text
) ?