cagliostrolab / dataset-builder

Apache License 2.0
33 stars 0 forks source link

About meta_keywords_black_list #1

Open yhzx233 opened 1 month ago

yhzx233 commented 1 month ago

Why filter out meta tags containing '(medium)' in 3.1 downloader? Wouldn't they help the model learn different styles?

yhzx233 commented 1 month ago

Additionally, I discovered a silly bug. The function filter_blacklisted_tags splits tags using ', ', but the tag_string_meta obtained initially is separated by spaces. This means that the effect of filter_blacklisted_tags is that if the meta tags contains any tag from the blacklist, it will be filtered to an empty string.