biolab / text-semantics

The package with scripts for semantic analyser project
MIT License
4 stars 5 forks source link

An option to get the sample of latest n document #10

Closed PrimozGodec closed 3 years ago

PrimozGodec commented 3 years ago

Since in some cases (proposals to the government), we want to get the sample of latest documents I am adding this option. Now, get_metadata function has an additional argument that defines the sampling strategy.

api.get_metadata("proposals-to-goverment", sample=10, sampling_strategy="latest")

Returns last 10 documents - last 10 rows of CSV or last 10 document in alphabetical order in the case when YAML metadata.

api.get_metadata("proposals-to-goverment", sample=10, sampling_strategy="random")

This is a default option and returns random n documents.