Open davemacdo opened 6 years ago
So, is this task just to define a map function which converts the existing format to the new format?
That will be next. First, I wanted to see if anybody had a better idea for structuring the data. A map function will come next.
I started working on a function to parse the names in the current combined name field into parts and found some edge cases we'll need to make a determination on how to handle.
There are some composers who have multiple names. These could be name changes due to marriage, stage names, gender transitions, etc. To be most useful as a research database, I think it's important to keep this information and make sure it's searchable. The current spreadsheet doesn't treat them consistently, so there will need to be some minimal manual adjustments (there are only a handful of these at the moment) anyway.
My idea would be to add a field for alternate names as an array of strings, which I would prefer to list as firstname surname
, since some of these may be stage names that have formatting that should be maintained (for example, it would be silly to write "Z, Pamela"). Does this seem like a practical way to store/retrieve this information? @robdeemer?
That looks good to me, David!
On Sat, Jun 9, 2018 at 10:07 PM, David MacDonald notifications@github.com wrote:
I started working on a function to parse the names in the current combined name field into parts and found some edge cases we'll need to make a determination on how to handle.
There are some composers who have multiple names. These could be name changes due to marriage, stage names, gender transitions, etc. To be most useful as a research database, I think it's important to keep this information and make sure it's searchable. The current spreadsheet doesn't treat them consistently, so there will need to be some minimal manual adjustments (there are only a handful of these at the moment) anyway.
My idea would be to add a field for alternate names as an array of strings, which I would prefer to list as firstname surname, since some of these may be stage names that have formatting that should be maintained (for example, it would be silly to write "Z, Pamela"). Does this seem like a practical way to store/retrieve this information? @robdeemer https://github.com/robdeemer?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davemacdo/composer-diversity/issues/31#issuecomment-396013825, or mute the thread https://github.com/notifications/unsubscribe-auth/AlCxlf0cPVpymFxj6b_Uf28kFUllW6vEks5t7H9GgaJpZM4UgAsn .
--
Dr. Rob Deemer Head of Music Composition & Special Assistant to the President American Council on Education Fellow (2016-2017) School of Music, State University of New York at Fredonia
Chair, NYSSMA Composition & Improvisation Committee Chair-Elect, NAfME Council of Music Composition
Composer-in-Residence, Buffalo Chamber Players http://buffalochamberplayers.org/ Composer-in-Residence, Harmonia Chamber Singers https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB4QFjAAahUKEwjmv7jhh-3IAhWLdD4KHSr2CNM&url=http%3A%2F%2Fwww.harmoniacs.org%2F&usg=AFQjCNGgWsCqj4leupJVTtIhJNwYXY26nw&sig2=TKTS2XdNhV0m9B3O3b0-jg
3162 Mason Hall • Fredonia, NY 14063 • (716) 673-3133 e-mail: deemer@fredonia.edu • web: www.robdeemer.com
What about doing a little more grouping and going to something like this? Name and location could both become objects, to keep the top level even cleaner. And perhaps "years" as grouping with birth and death within.
I'm also keen on the idea of geocoding location with lat/lng coordinates, but that would make the import more difficult. (Might not be a dealbreaker, but it does make it tougher)
I like the idea of name and location as objects. We're not currently collecting birth years in the research, but there's no reason it couldn't have a field if that ever changes. I have also been thinking about geocoding locations, as I would love to be able to search for "within x miles of y". There are some tough problems associated with that: some records have no location data, records have different levels of precision (city, state, country), and some records have multiple locations listed. I think it's worth setting up something. Perhaps we could write a function that would update lat/lng any time the location object changes? That way the data is there, but not calculated dynamically for each query. (It also makes it easier to add later once we figure out the best way to generate it from the data.)
Oh, and I'm a dummy and forgot the URL field in my initial post. My changes bolded:
Let's punt on the geocoding for now, but put it on the roadmap for sure.
misc thoughts: "in USA" is redundant if we are storing country?
Gender, Genre, Medium, Demographic - I get that the data will be stored as a string, but let's make sure they are treated like foreign keys, not free text; ie there should be a canonical list of each, and the keys will all be present in the i18n lexica.
Also, just an observation, this data structure is PERFECT for a Mongo database. https://www.mongodb.com/
I had a similar thought about "in USA" but I left it in since I noticed at least one person that had two countries listed in their location field, so I wasn't sure if we could count on formatting being consistent. I'm all for eliminating unnecessary data entry, but maybe it could live on as a convenience method that is dynamically generated based on the location.
And on a related note: should location be an array of objects to handle people who are associated with multiple locations?
Totally with you that this screams out Mongo. It sounds like there's some leaning towards Wordpress. If this goes the Wordpress route, I was assuming gender, genre, medium, demographics would all be custom taxonomies. I haven't worked with i18n, so I don't know if that would work.
I haven't been following the discussion about Wordpress, but I would resist forcing this data into WP's schema.
My vote would be for a REST API with an agnostic front-end that will answer search requests from anywhere.
I like the idea of ditching "in USA" in favor of a function, and I agree with @ianring on drawing those strings from a limited set. I imagine we could just control this from the application end. Using the keys from the string localization should make lots of things easier.
I also like @gtwright's idea of making location into an array to handle multiples a little more nicely.
The WP discussion was based on making the tool as easy and friendly as possible to the research team who will be maintaining the data. WP gives us user management tools and a possibly familiar UI, but Ian makes a good point about the schema. And I think I mentioned before that we might end up spending as much time tweaking WP as we would making something from scratch(ish). Ian, I assume you're imagining the API in PHP on the server? Reading everything from an API is a good goal whichever way we choose, and a headless WP loses some of the benefits of using WP over other options.
For the database platform, I'm totally ok with either WP or a Mongo database. I'm learning pretty much every piece of this as we go, so I'll defer to you all.
I'm totally on board for an agnostic front-end. 👍
For the back-end, I'd say it's all about what'll be easy for the research team (who I don't know at all) and how much time and/or money to get there. Given that the data set is relatively straight forward, I think there are probably several CMSs that would do the trick—and that building something from scratch wouldn't be too onerous.
The big reason I still lean towards Wordpress is that it is such a known commodity, both for the research team (yay for less time on training and tech support) and for estimating project complexity on our end. And an extra bonus is Wordpress doesn't scale in cost per user, like many other CMSs.
All that being said, I'm pretty open minded about solutions. I just don't want to sink time into building things like user auth or answering UI questions for a research team if we don't have to :)
Prisma and GraphCMS look pretty darn cool.
I mentioned this in an earlier Issue, but want to put it in this thread: One option I was looking at early on is Google Firestore, which is another NoSQL document database that also comes with a handful of useful bells and whistles like Firebase Auth that we could use for user management and Cloud Functions, which could help with things like updating coordinates automatically when location fields change. The free plan looks like it could support us for a while, and it even includes some hosting if needed.
Just put together some quick custom post type/taxonomies/fields in WP to see what this might look at. I've got a little proof of concept up here. @davemacdo Sending you a log in now. If anyone else wants a login, just send me your email.
I went ahead and swapped the Algolia demo over to use the index from this WP site. It should work that any change to a post in WP gets re-indexed right away.
This is all pretty rough, but I just wanted to see how well the indexing would work with this setup. Seems a reasonable proof of concept if we want to go the WP route.
I'm pretty impressed with the WordPress editing side, and I think the user management and permissions make it a good choice. Here's a screenshot of the editing UI for those interested:
This looks great!
———————— Dr. Rob Deemer Head of Music Composition & Assistant to the President American Council on Education Fellow (2016-2017) School of Music, State University of New York at Fredonia
Chair, NYSSMA Composition & Improvisation Committee Chair-Elect, NAfME Council of Music Composition
Composer-in-Residence, Buffalo Chamber Players Composer-in-Residence, Harmonia Chamber Singers
3162 Mason Hall • Fredonia, NY 14063 • (716) 673-3133 e-mail: deemer@fredonia.edu • web: www.robdeemer.com
On Jun 13, 2018, at 11:30 AM, David MacDonald notifications@github.com wrote:
I'm pretty impressed with the WordPress editing side, and I think the user management and permissions make it a good choice. Here's a screenshot of the editing UI for those interested:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Graham, do you think a WP backend would work doing our own realtime search, or would it require an indexing service? Querying with each keystroke seems costly, but I haven't tried it.
@davemacdo Don't know if I would try the realtime/autocomplete approach directly through WP queries (i.e. without indexing through Algolia, Elasticsearch, or something similar). But I don't know if realtime would be totally necessary either.
The WP/Algolia plugin is pretty good at re-indexing every time a change is made. I put a couple of test records in there just (with "[TESTING]" in the name) just for playing around with. The changes in the WP back end should show up in that Algolia mock-up within a few seconds (in theory, it shouldn't even need a refresh). Feel free to edit those like crazy and see what pops up in searches. There's also a console log of all of the search hits so you can see what's coming out there.
I did a quick and dirty import of the old db from the JSON file in this repo (perhaps a few days old). I did a couple of spot checks for accuracy, but not a deep audit. So the composer list in WP should be in decent shape.
I'm still pretty open to the idea of alternative backends. I mainly wanted to see what a WP proof of concept would look like. I'm fairly satisfied that this would do the trick.
Oh, and @davemacdo, I made you an admin in WP, so you can add anyone else you'd like.
Anyone else can get in with username demo, pwd demo with contributor access. Or let me know your email and I'll set you up with your own account to dig around.
It's good to know how fast the updates are run. I think we should go for it. I'd lean toward Algolia only because I have nightmares about the AWS console. I agree realtime search isn't necessary, but it is pretty cool.
@gtwright Should we work out the WP side with the content you have now and then grab a fresh set of data right before the switch? I can start a repo for the server-side stuff and give you push access. I can also set up a subdomain of composerdiversity.com for it (admin? research? data?).
Sorry to be awol...jury duty this week. Fun times. (Looks like I'll be on a case through Tuesday...)
I'd say it's a good plan to work out any WP schema/structure issues, then clear out the db and re-import. I have a fairly jury rigged php template set to import into WP from a JSON. It's masquerading as a page template here.
I've gone ahead and update the admin repo with the Wordpress install, so we can move any Wordpress discussion over there.
Background
Currently, the data is in a Google Sheet. I use the Sheets API manually (silly to use an API manually, I know) to pull the content as JSON and check it into the repo here from time to time. The spreadsheet is maintained by a fantastic group of researchers and informed by user submissions in a Google Form.
The sheet uses a series of columns that are used as booleans ('X' for true, empty string for false). This is pretty easy for the researchers to handle, but makes for a lot of fields per record.
New format
I would like to convert the data to a different format before moving it elsewhere. Here's my current thinking on what we need to have:
This moves from 40+ fields per record to 11. Even just making this change and keeping the data where it is could shrink the flat file enough to improve performance. Thoughts?