dpriskorn / LexUse

Scripts related to Wikidata Lexemes and getting usage examples from different CC0 data sources.
GNU General Public License v3.0
1 stars 4 forks source link

Bug: Skips finds from Riksdagen API if Europarl suggestions are rejected #17

Open Ainali opened 3 years ago

Ainali commented 3 years ago
Found in https://data.riksdagen.se/dokument/EF9C12
'pannkakor' not found as part of a word or a word in the summary. Skipping
Found in https://data.riksdagen.se/dokument/ee9d32
Found in https://data.riksdagen.se/dokument/EE9O2
'pannkakor' not found as part of a word or a word in the summary. Skipping
Found in https://data.riksdagen.se/dokument/DQ30220
Processed 121 records and found 28 exact hits for the form 'pannkakor'
Presenting sentence 1/1 from europarl
Found the following sentence with 41 words. Is it suitable as a usage example for the noun form 'pannkakor'? 
'Tror vi liksom tidigare på våra egna principer eller låter vi oss påverkas så till den grad att vi anser att Ryssland är så annorlunda att inte bara våra pannkakor utan också våra förbindelser i ett partnerskap måste följa Rysslands modell?' [(Y)es/(n)o/(s)kip this form]: n
Trying to find examples for the noun lexeme form: raketen with id: L240943-F2

It looks like results from Riksdagen API was found, but after the suggestion from europarl was rejected these are not shown, instead it moves on to the next form.

I expected to see more suggestions from Riksdagen API.

dpriskorn commented 3 years ago

I changed this in the new version. If you enable debugging in the config and pass --log=debug on the command line you can see more what happens.

Right now I exclude sentences which are not suitable from riksdagen. See details in riksdagen.py what words are currently leading to exclusion. It's EG, EU, Riksdagen, Sammanträde, etc. We/you could relax that, but I put them there because I got a lot of garbage. I would rather rely on more different sources.