[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density

Currently, SS, PP, and ND each allow .txt file uploads, but they have different behaviours and allow different inputs. Ideally, these would all be aligned.

Currently, the behaviour is:

PhonProb
Words in corpus, spelling: calculates Words in corpus, transcription: can’t do this — wants spelling Words not in corpus, spelling: can’t do this — not in corpus Words not in corpus, transcription: can’t do this — wants spelling and in corpus Words not in corpus, both: can’t do this, even though old docs said you could!

Ideally, PCT would calculate PP regardless of whether spelling or transcription is provided, and if there are words not in the corpus, it would skip them (reporting them to the user and returning N/A), while still calculating the rest of the list.

ND
Words in corpus, spelling: calculates (must specify that file contains spelling) Words in corpus, transcription: calculates (must specify that file contains trans) Words not in corpus, spelling: calculates, giving NA for words not in corpus and telling you which they are Words not in corpus, transcription: calculates for all, explaining that some words aren’t in corpus

This one is currently the closest to the ideal solution for all!

String Sim
Word pairs in corpus, spelling: calculates
Word pairs in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are Word pairs not in corpus, spelling: calculates, giving either result if it can or NA for words not in corpus, and tells you which they are Word pairs not in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are

This behaviour is basically fine, but there's no principled reason why the algorithm couldn’t calculate SS for word pairs given in transcription, even if not in the corpus — this just might be problematic with phonological edit distance? But we could make it like ND and just grey out that option for that algorithm.

Sample files to test all of this can be found in: ~/Dropbox/Phonological_CorpusTools_Public/PCT_text_file_upload_tests

PhonologicalCorpusTools / CorpusTools

[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density #782