eXtensibleCatalog / Metadata-Services-Toolkit

Tools for processing and aggregating metadata
Other
6 stars 3 forks source link

Feedback on Comments, primarily about MySQL #610

Open patrickzurek opened 8 years ago

patrickzurek commented 8 years ago

JIRA issue created by: rcook Originally opened: 2011-07-19 08:16 PM

Issue body: (nt)

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-07-19 08:22 PM

Comment body:

Below is a copy of a comment I received from an IT/systems person for an institution looking at XC. I got them to send me the comment after my assurance that they would stay anonymous. My point in putting this out here is to see of there is anything we can do to ward of any of the underlying issues mentioned:

  1. Significant server resources needed, large disk space needed
  2. Data Integrity issues with MySQL
  3. The spread out system component requirements - I think we should write up a summary sheet of what is needed if an institution were to want to install a demo XC? Who can spearhead this?

Comment I was sent follows:

I'm taking a look at this - it looks like it requires a significant amount of server resources for each component. As I'm looking at this, am I correct in assuming that it consumes MARC record data from another database, processes them and serves them out? In other words, it's not handling transactional information about the items? The reason I'm asking is that MySQL isn't the best choice for data integrity, backup and restore / disaster recovery services. It's fast, but it's not as reliable as PostgreSQL. When things have gone wrong, we sometimes see issues with MySQL tables. We've never really had that happen with either PostgreSQL or Microsoft SQL Server.

The system requirements are sort of spread out all over the place, since it's per component. I'll keep looking at it. We may need to do a pilot implementation that can be used for testing, but can't be used to provide actual services, and then build an entirely new environment later if there's a desire to start using this on public-facing offerings.

patrickzurek commented 8 years ago

JIRA Coment by user: banderson JIRA Timestamp: 2011-07-20 02:58 PM

Comment body:

I thought that was a good idea to send the note you did, Randy.

1.  Significant server resources needed, large disk space needed

Besides disk space, I don't think this is true.  The servers we've been running on are nothing special.  What might be interesting would be to try running all the components on one server.  

Large disk space is needed - can't argue that one.  Of course large disk space is relative and disk is cheap.  However, I'm guessing most libraries don't already have disks this size lying around.

2.  Data Integrity issues with MySQL

hmmm - I'm not going to argue against their experience or against PostgreSQL, but I will say that MySQL is "the world's most popular open source database".  Just look at their customer list:
http://www.mysql.com/customers/

I've run our data through innumerable times now and can think of one time where maybe (just maybe) MySQL *might* have failed.  I also used MySQL heavily at TextWise.  We had multiple production level applications that stored and queried massive amounts of data.  I don't recall MySQL having any problems over the 2.5 year span I was there.

3.  The spread out system component requirements - I think we should write up a summary sheet of what is needed if an institution were to want to install a demo XC?  Who can spearhead this?

That's probably a good idea.  Perhaps we could try putting all components on one server.  Have one person do it start to finish and see where the documentation falls short (as Randy suggested).  I could do it, but it would certainly slow down everything else I'm doing right now :-)
 

patrickzurek commented 8 years ago

JIRA Coment by user: dlindahl JIRA Timestamp: 2011-07-20 03:14 PM

Comment body:

Couple comments:

1) The high server resources needed are mostly focused on disk space - I agree. The amount of disk space needed depends on the size of your collection. Larger institutions would need more (and would have more resources). Smaller institutions would need less, and would have fewer resources. XC is open-source which means that the initial cost for the software is zero, and the support is potentially higher than commercial offerings which could balance out. The cost for disk space is real, but it is a small percentage of the total cost - IT staffing, and servers are probably the most significant factor in pricing XC. The benefit of our software is that it is open, can be modified at will, and brings capabilities in-house, into the library. XC was never really focused on the hosted model, it was intended to allow libraries to have the flexibility to respond to user needs. Of course, for a library to be motivated to use XC, they must feel that the benefits are a good fit with their institution, and that the trade-offs are worth the cost.

2) I don't know of any data integrity issues with mySQL. Anecdotal experience of having past issues does not mean that mySQL is deficient necessarily.

3) I agree that we need to install the whole thing on a server from start to finish following some directions. I think it would be ideal to design a path through the documentation that goes in order through the toolkits, step by step. Perhaps a roadmap, perhaps just an intro page that takes you to the install instructions for the first toolkit you need to install, and when that document ends, it should link to the next, and so on.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-07-21 03:51 PM

Comment body:

I think we all agree that we need a summary style install (#3 below) that covers the system wide components.

I am going to currently leave this assigned to to John, but if Ralph has any bandwidth in August, I would like him to take this on so John can keep coding.