Open pettter opened 4 years ago
We further split this into several different types of files, e.g.
JSONCorpus (like the iRealPro corpus) XMLCorpus (XPath navigation?) SQLiteCorpus? Others?
JSONCorpus has basic functionality as of 99e9f5a.
several text files, humdrum, abc (not the ABC corpus), etc.
lilypond, e.g. the Crueger cantional settings in this ZIP file: https://miami.uni-muenster.de/Record/c8e13273-c323-4c20-93f3-e3e6caff3224
I'm not sure how common it is to have the entire corpus as a single file in those formats - I've mostly seen them used to contain a single "piece", with corpora being collections of many such files.
True. We should have a list of potential formats somewhere, though.
It can happen though, sometimes you have a summary file of a corpus, which contains some representation of all pieces. Think of the csv file in the Choro corpus, or the Jazz trees, which are all in a single JSON file. The questions is whether they have something useful in common.
The Jazz trees (and similar JSON corpora) are supported in a very basic way, and implementing a similar thing for CSV should be relatively straightforward.
I'll have a look at getting the Choro corpus in.
@fabianmoss The Choro corpus seems to be private at the moment?
Yes. Because the paper is STILL in review. I can give you a copy of the file tomorrow if you remind me again 😉
pettter notifications@github.com schrieb am Di., 25. Feb. 2020, 14:45:
@fabianmoss https://github.com/fabianmoss The Choro corpus seems to be private at the moment?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DCMLab/CorpusInterface/issues/5?email_source=notifications&email_token=AECLXOA67OICWN3AWERLSFDREUOGZA5CNFSM4KKJRI32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM4ALJQ#issuecomment-590874022, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECLXOF6ZQC2SRNJ4PKEQRTREUOGZANCNFSM4KKJRI3Q .
Ah, no that's fine, I can get the file just fine, it's just a question of if we could add it to the corpora.csv file as something download/loadable.
..But I guess I maybe should remove or at least obfuscate a little the ten-line excerpt I added to the git-test-corpus?
The previous implementations of single file corpora have been removed at some point. They are now developed in #28.
Support for corpora that is not a collection of files but segments of a single file.