PlanTL-GOB-ES / corpus-cleaner

Generic toolkit for corpus cleaning
MIT License
5 stars 0 forks source link

BSC Crawl data parser: url = url & keywords = url? #85

Open asier-gutierrez opened 3 years ago

asier-gutierrez commented 3 years ago

https://github.com/TeMU-BSC/corpus-cleaner/blob/048cb11d002ba545d92fab07d5b3ee869ef139fc/corpus_cleaner/components/a_data_parser/bsc_crawl_json_parser.py#L20 https://github.com/TeMU-BSC/corpus-cleaner/blob/048cb11d002ba545d92fab07d5b3ee869ef139fc/corpus_cleaner/components/a_data_parser/bsc_crawl_json_parser.py#L21