CentreForDigitalHumanities / tscan

T-scan: an analysis tool for dutch texts to assess the complexity of the text, based on original work by Rogier Kraf
GNU Affero General Public License v3.0
18 stars 6 forks source link

modify intensify list #18

Open peterATixly opened 2 years ago

peterATixly commented 2 years ago

I'd like to be able to modify the intensifiers but I can't see that information in the data directory (even after running download.sh) - is this available to adjust?

kosloot commented 2 years ago

Well, those are found in the file 'intensiveringen.data' that are normally NOT distributed. I quote from downloaddata.sh:

#Dummies for RBN-derived data, not redistributable due to overly restrictive license (http://tst.inl.nl/producten/rbn/toonl
icentie.php)
RESTRICTEDDATA="adjs_semtype.data general_nouns.data general_verbs.data intensiveringen.data nouns_semtype.data verbs_semty
pe.data"
for file in $RESTRICTEDDATA; do
    if [ ! -f $file ]; then
        touch $file
    fi
done

This information is old, I'm not sure if those restrictions still hold. @mhkuu is INL still that anxious?

peterATixly commented 2 years ago

Ah I had assumed that only referred to the part that was derived from the RBN - the rest coming from other sources: onderwoorden.nl ; SoNaR - but I assume they are difficult to separate from one another.

mhkuu commented 2 years ago

Hi @peterATixly and @kosloot, I'm not sure about the exact origin of all words appearing on the lists. The RBN license still seems quite restrictive. I can provide the lists to you via e-mail, though.

peterATixly commented 2 years ago

Hi @mhkuu - the list would be helpful - my private email is petercaine@gmail.com