PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density #782

Open kchall opened 3 years ago

kchall commented 3 years ago

Currently, SS, PP, and ND each allow .txt file uploads, but they have different behaviours and allow different inputs. Ideally, these would all be aligned.

Currently, the behaviour is:

  1. PhonProb
    Words in corpus, spelling: calculates Words in corpus, transcription: can’t do this — wants spelling Words not in corpus, spelling: can’t do this — not in corpus Words not in corpus, transcription: can’t do this — wants spelling and in corpus Words not in corpus, both: can’t do this, even though old docs said you could!
  1. ND
    Words in corpus, spelling: calculates (must specify that file contains spelling) Words in corpus, transcription: calculates (must specify that file contains trans) Words not in corpus, spelling: calculates, giving NA for words not in corpus and telling you which they are Words not in corpus, transcription: calculates for all, explaining that some words aren’t in corpus
  1. String Sim
    Word pairs in corpus, spelling: calculates
    Word pairs in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are Word pairs not in corpus, spelling: calculates, giving either result if it can or NA for words not in corpus, and tells you which they are Word pairs not in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are

Sample files to test all of this can be found in: ~/Dropbox/Phonological_CorpusTools_Public/PCT_text_file_upload_tests