Closed thomaschhh closed 9 months ago
Since we need the Wiki articles to replicate your steps, I suggest to adapt https://github.com/bene-ges/nemo_compatible/blob/2b1ca5934d57256006a0a9f66c467587ba07df05/scripts/nlp/en_spellmapper/dataset_preparation/preprocess_yago.sh#L34
to
WIKIPEDIA_FOLDER=./yago_wikipedia mkdir $WIKIPEDIA_FOLDER awk 'BEGIN {FS="\t"; print "#!/usr/bin/env bash"} {print "wget \"https://en.wikipedia.org/w/api.php?format=xml&action=query&prop=extracts&titles=" $1 "&redirects=true&format=json&explaintext=1&exsectionformat=plain\" -O \"'"$WIKIPEDIA_FOLDER"'" $2 ".txt\"\nsleep 0.1"}' < yago.uniq2 > run_wget.sh bash ./run_wget.sh
based on what is needed later on https://github.com/bene-ges/nemo_compatible/blob/2b1ca5934d57256006a0a9f66c467587ba07df05/scripts/nlp/en_spellmapper/dataset_preparation/build_training_data.sh#L20
If it works you can make a pull request, I will accept it
Fixed in #8
Since we need the Wiki articles to replicate your steps, I suggest to adapt https://github.com/bene-ges/nemo_compatible/blob/2b1ca5934d57256006a0a9f66c467587ba07df05/scripts/nlp/en_spellmapper/dataset_preparation/preprocess_yago.sh#L34
to
based on what is needed later on https://github.com/bene-ges/nemo_compatible/blob/2b1ca5934d57256006a0a9f66c467587ba07df05/scripts/nlp/en_spellmapper/dataset_preparation/build_training_data.sh#L20