Helsinki-NLP / OpusTools

67 stars 17 forks source link

Query to get list of existing corpora (by language) #11

Closed dumitrescustefan closed 4 years ago

dumitrescustefan commented 4 years ago

Hi,

Is there a way to query the data in such a way that I could get all available corpora that have a certain language? For example, right now I would need to create a monolingual aggregated corpus for a number of languages. For example I would like to get all the corpora that contain "hr" -> xx (no matter target language as long as one side has "hr" sentences").

Is there a way to achieve that automatically?

Thanks a lot! BTW. Thank you, throughout the years I've been using OPUS from time to time, great resource.

dumitrescustefan commented 4 years ago

opus_get -s hr -p raw -q For future people that first post issues and then check the code in detail. Thanks!