Currently, SS, PP, and ND each allow .txt file uploads, but they have different behaviours and allow different inputs. Ideally, these would all be aligned.
Currently, the behaviour is:
PhonProb
Words in corpus, spelling: calculates
Words in corpus, transcription: can’t do this — wants spelling
Words not in corpus, spelling: can’t do this — not in corpus
Words not in corpus, transcription: can’t do this — wants spelling and in corpus
Words not in corpus, both: can’t do this, even though old docs said you could!
Ideally, PCT would calculate PP regardless of whether spelling or transcription is provided, and if there are words not in the corpus, it would skip them (reporting them to the user and returning N/A), while still calculating the rest of the list.
ND
Words in corpus, spelling: calculates (must specify that file contains spelling)
Words in corpus, transcription: calculates (must specify that file contains trans)
Words not in corpus, spelling: calculates, giving NA for words not in corpus and telling you which they are
Words not in corpus, transcription: calculates for all, explaining that some words aren’t in corpus
This one is currently the closest to the ideal solution for all!
String Sim
Word pairs in corpus, spelling: calculates
Word pairs in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are
Word pairs not in corpus, spelling: calculates, giving either result if it can or NA for words not in corpus, and tells you which they are
Word pairs not in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are
This behaviour is basically fine, but there's no principled reason why the algorithm couldn’t calculate SS for word pairs given in transcription, even if not in the corpus — this just might be problematic with phonological edit distance? But we could make it like ND and just grey out that option for that algorithm.
Sample files to test all of this can be found in:
~/Dropbox/Phonological_CorpusTools_Public/PCT_text_file_upload_tests
Currently, SS, PP, and ND each allow .txt file uploads, but they have different behaviours and allow different inputs. Ideally, these would all be aligned.
Currently, the behaviour is:
Words in corpus, spelling: calculates Words in corpus, transcription: can’t do this — wants spelling Words not in corpus, spelling: can’t do this — not in corpus Words not in corpus, transcription: can’t do this — wants spelling and in corpus Words not in corpus, both: can’t do this, even though old docs said you could!
Words in corpus, spelling: calculates (must specify that file contains spelling) Words in corpus, transcription: calculates (must specify that file contains trans) Words not in corpus, spelling: calculates, giving NA for words not in corpus and telling you which they are Words not in corpus, transcription: calculates for all, explaining that some words aren’t in corpus
Word pairs in corpus, spelling: calculates
Word pairs in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are Word pairs not in corpus, spelling: calculates, giving either result if it can or NA for words not in corpus, and tells you which they are Word pairs not in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are
Sample files to test all of this can be found in: ~/Dropbox/Phonological_CorpusTools_Public/PCT_text_file_upload_tests