TeHikuMedia / nga-tautohetohe

Code for extracting Māori text from the New Zealand Hansard
5 stars 2 forks source link

Move all relevant non repo files to a new home #9

Open kmahelona opened 6 years ago

kmahelona commented 6 years ago

Hey,

I’ve noticed a few changes to google drive links and requests for files (https://github.com/TeHikuMedia/nga-tautohetohe/pull/8, https://github.com/TeHikuMedia/nga-tautohetohe/issues/7). I’m wondering if we can consolodate these all to a single place and/or move those files to git.

Te Hiku Media can host files via our google drive account, but I wonder if there’s a better place for these sorts of things?

I could also talk to The Office of the Clerk as they might be happy to host these publicly together in an accessible way and it makes more sense for them to do that. Until then, I’d like to move things to Te Hiku Media so we can ensure the links to files will work and we don’t have to chase up people if they break.

@willscire thoughts?

niusealeo commented 5 years ago

Hi @kmahelona , finally looking at this again lol, the Hansard links are normally listed on the parliament webpage, and some of the volumes seem to be stored in Google drive folders on behalf of the Crown - such as this one for 1987-2002: https://drive.google.com/drive/folders/0B1Iwfzv-Mt3CRGZkMWNfeXoybmc

Not sure why the link changes - maybe the person managing the google folders moves them around sometimes Maybe we could upload some of the files into the corpora API as well ??