Closed SaulLu closed 2 years ago
In this PR you will find the scripts developed for:
c4/en
webstite description
timestamp
entity
Some investigations have to be done on the resulting toy dataset before merging this PR
In this PR you will find the scripts developed for:
c4/en
containing 1000 examples randomly extracted from the first 2 files of c4/en (). This dataset is shared on the hub at bs-modeling-metadata/c4-en-reduced.webstite description
,timestamp
andentity
metadata. The resulting dataset is shared on the hub at bs-modeling-metadata/c4-en-reduced-with-metadataSome investigations have to be done on the resulting toy dataset before merging this PR