Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Beef from Gemini/MySQL on CentOS #828

Closed ajs6f closed 6 years ago

ajs6f commented 6 years ago
TASK [Islandora-Devops.crayfish : Install Gemini db schema (mysql)] ************************************************************
Thursday 10 May 2018  14:34:42 -0400 (0:00:01.104)       0:21:13.837 **********
fatal: [default]: FAILED! => {"changed": false, "msg": "ERROR 1709 (HY000) at line 1: Index column size too large. The maximum column size is 767 bytes.\n"}

I'm going to check into whether this is a problem with CentOS7's stock version of MySQL, but I'm trying with PostgreSQL for the moment.

whikloj commented 6 years ago

Yes, it is. I'll have to dig it up but we've run into this because the stock Centos7 is a MariaDB 5.5 whereas Ubuntu is 5.7. Therein lives the issue. That or the huge possible indexes from our Gemini table schema.

seth-shaw-unlv commented 6 years ago

Linking ansible-role-crawyfish issue #2

ajs6f commented 6 years ago

Ok, so bumping MySQL on the fly should get past this, eh? Trying....

ajs6f commented 6 years ago

Anyone know if 5.6 solves this, or if I have to go ATW to 5.7?

seth-shaw-unlv commented 6 years ago

Looks like it is still a problem with 5.6. HOWEVER, it looks like you can get around the limitation with some configuration flags.

ajs6f commented 6 years ago

Nice catch, @seth-shaw-unlv ! I am trying with MariaDB 10.0, and if I can't bang it out, I'll try with 10.1.

ajs6f commented 6 years ago

I think I just hit this because MariaDB 10.0 uses SysV init on CentOS7, trying their workaround.

ajs6f commented 6 years ago

Aaaaand, it turns out to actually be easier to just upgrade past 10.1.8, when MariaDB started using systemd.

ajs6f commented 6 years ago

Okay, using MariaDB 10.2 and with a fix for a problem with the package I was able to get past this. So far. I'm not sure what to contribute back to CLAW, though-- I guess we have to be honest and say you just can't do this on stock CentOS7? Or should we try to mung the Gemini schema when installing to CentOS (which sounds horrible and unmaintainable)? Or will some kind of new hash-based design (as mentioned in the abovelinked ticket) obviate the need to upgrade?

seth-shaw-unlv commented 6 years ago

@ajs6f @DigitLib

It doesn't look like anyone is tackling a Gemini revision to resolve this issue anytime soon and @ajs6f moved forward with a Maria 10.2 install. @DigitLib, would you have any objections to the claw-playbook installing Maria 10.2 by default on CentOS?

I think I have a revised playbook working with Maria 10.2 that I'm putting through a full rebuild right now. I can issue a PR for more community review/discussion once I confirm it works and if you two think this is a good idea.

ajs6f commented 6 years ago

It sounds reasonable to me. TBC, I may have some trouble selling 10.2 locally (I know, but we're pretty slow-moving here, and often for good reason) but that's my problem, not CLAW's problem.

But I feel like we might be losing sight of an idea (not sure whose, maybe @DiegoPino ?) to move to a hashed form of the URIs (which could let us shrink the column length)? Not saying we should prefer that, just that I don't want to lose that thread because of my site's conservative deployment policies.

ajs6f commented 6 years ago

TB even more C, definitely let's have the PR ( I will test using CentOS Vagrant).

DiegoPino commented 6 years ago

@seth-shaw-unlv @ajs6f since i did not get any positive feedback (or a postcard or a beer) to do so when I suggested the index alter and hash I though it was not a thing you all wanted. But i can for sure refactor that, i wrote a service that does that (in node.js, rocksdb long story) so can just copy that logic if you all care. Personally, i think refactor is needed even if this all passes, such long indexes are never fast when you reach a few Mill. urls.

ajs6f commented 6 years ago

It would be nice both to keep the field size shorter (for speed) and also to have a more well-distributed index (because of the hashing).

DigitLib commented 6 years ago

@seth-shaw-unlv I don't have MariaDB 10 I tried a @DiegoPino scheme https://gist.github.com/DiegoPino/825d33da1a376b72cf0b0c06ff66bed1 and worked OK. I can try to install MariaDB 10...

seth-shaw-unlv commented 6 years ago

Arg! Maria 10.2 is still complaining that the field is too long because varchar is 3 bytes per char and 3*2048 is bigger than the new max of 3072. It looks like we need to update Gemini after all.

ajs6f commented 6 years ago

Hm, that's weird-- Maria 10.2 worked for me.

seth-shaw-unlv commented 6 years ago

@ajs6f Can you send me your server config to see if I'm missing something?

ajs6f commented 6 years ago

Sure, but what exactly do you want-- the MariaDB config?

seth-shaw-unlv commented 6 years ago

@ajs6f yes, please. I should have been more clear.

ajs6f commented 6 years ago

@seth-shaw-unlv No prob, here ya go.

seth-shaw-unlv commented 6 years ago

That looks consistent with my test. Hmmm, perhaps there is something deeper I'm missing.

@ajs6f Can you log into your db and run show variables; and desc gemini.Gemini; for me?

ajs6f commented 6 years ago

Oh, WOW. Something is really off here:

MariaDB [(none)]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [(none)]> use gemini;
Database changed
MariaDB [gemini]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [gemini]> show tables;
Empty set (0.00 sec)

And yet the -playbook install finished successfully and CLAW appears to be running fine.

ajs6f commented 6 years ago

Yep, further checking reveals that Gemini is not being used, Fedora is not receiving updates (except API-X's setup) long story short, that -playbook finished without error but without correctly installing. :(

ajs6f commented 6 years ago

What's more, from the Drupal side, everything appears to be working well. There's no indication that the various CLAW components are not installed/connecting/correct.

DiegoPino commented 6 years ago

@ajs6f @whikloj @seth-shaw-unlv @DigitLib I started refactoring Gemini last night. Will make a pull tonight at some point. I went for total abstraction via doctrine-ORM and should work out of the box for you all. Also, if you are Ok with this, I will move table creation to Gemini, so no more SQL script that runs independently first/DevOps tasks, Gemini will provide its own method for doing so probably will follow with an ansible related pull too (next week)

Lastly, for your own thoughts, since the changes are based on my own Nodejs based mapping/routing service, different thing but similar need, I could leave "space"/plan for the following: A Source Server/Destination Server field. Means, same Drupal could write to multiple Fedora's (round robin?, balance, random? via rules?), Same Fedora could be synced to another Drupal if needed (i speak only about the mapping, not the logic really, CLAW is a different beast than what I have). That means, Gemini in the future would/Could become a router, not only a mapper. Probably worth another issue in the future or out of scope.

Thoughts? hopes?

ajs6f commented 6 years ago

@DiegoPino Thoughts:

  1. I'm agnostic to ORM or other method, but putting the schema into Gemini itself sounds excellent, and I would be grateful for it. At this point I am pivoting to avoid Ansible entirely for SI (for us it has produced enormous overhead with no accompanying value), so keeping component initialization with each component sounds great.
  2. Routing vs. simple identifier mapping: At least worth another issue, at which time I will inflict upon everyone my opinion that such a feature would be very unRESTful. The stronger move here is to distribute the repository (a la @acoburn's Trellis), so that Gemini can fire requests at a single address. Let middleware do middleware and let backend do backend, and keep the contract in between as simple and narrow as can be.
DigitLib commented 6 years ago

@DiegoPino Goood idea to move SQL scripts to Gemini! Waiting for testing in claw-playbook...

ajs6f commented 6 years ago

@seth-shaw-unlv Just to confirm, I am manually running mysql Ver 15.1 Distrib 10.2.15-MariaDB, and the following schema works fine for me (manually):

USE gemini;
CREATE TABLE IF NOT EXISTS Gemini (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    uuid VARCHAR(36) NOT NULL UNIQUE,
    drupal VARCHAR(2048) NOT NULL UNIQUE,
    fedora VARCHAR(2048) NOT NULL UNIQUE
) ENGINE=InnoDB;
seth-shaw-unlv commented 6 years ago

@DiegoPino the doctrine-ORM and table creation via Gemini all sound good to me. As for the enhancements, I'm inclined to keep it simpler for now. We can expand later if we need/want it.

@ajs6f odd. It was still complaining to me when I ran the same CREATE query manually using 10.2. Anyway, it looks like @DiegoPino will have a fix in place making the point moot tomorrow.

ajs6f commented 6 years ago

@seth-shaw-unlv Yep, this only gets weirder; I did some actual resource-creation and my manually-cobbled-together Gemini definitely works. Resources are getting to the backend and the right paths are being recorded in Gemini.

I agree that we should just roll forward with @DiegoPino.

DiegoPino commented 6 years ago

@ajs6f @seth-shaw-unlv, so, i have this stuff working. I see folks are fighting with composer and stuff so really don't want to drive them crazy right now. When it is right to pull this? Seems like there is some close due date i don't want to interfere with. Anyway, code works.

seth-shaw-unlv commented 6 years ago

@DiegoPino why don't you go ahead and issue the PR. Between us and perhaps another committer that isn't fighting with composer we can take care of it, I think. If nothing else it gives @ajs6f and @DigitLib something to target their centos builds against.

DiegoPino commented 6 years ago

Ok, cool! Will deal with this once day job is over. Thanks!

DigitLib commented 6 years ago

@seth-shaw-unlv @DiegoPino I agree!! Go ahead...

ajs6f commented 6 years ago

@seth-shaw-unlv @DiegoPino Go for it!

DiegoPino commented 6 years ago

@ajs6f @seth-shaw-unlv @DigitLib @Islandora-CLAW/committers I don't want to go into architecture weirdness but while making the formal pull (cleaning my mess) I just found out that 1) http://symfony.com/blog/the-end-of-silex 😞 and that makes me nervous, maybe I should refactor this totally into Symfony 4 and do a single Gemini repo with no dependencies to the rest of crayfish ecosystem at all. I mean, move one piece at the time and do it full symfony. I feel Gemini here is the key to the other services and maybe should be "upgraded independently". Would love to know your thoughts. It seems also more relevant to the future of this project that just patch it.

if you decided to stay on Silex, to be less rogue (i'm using an independent silex service provider for ORM right now), I could need to A) update crayfish-commons https://github.com/Islandora-CLAW/Crayfish-Commons/blob/master/src/Provider/IslandoraServiceProvider.php or B) simply not use dbal at all in Geminy and make direct use of the ORM service provider without requiring crayfish commons there. Which leads to a question to you folks about why the dependency here and if the other tiny crayfishes need ORM? or want to stay with dbal?

In any case, I can make a few pulls with different options, nothing here is life changing. But please let me know your thoughts. Also, if i'm speaking non-sense too.

seth-shaw-unlv commented 6 years ago

My usual inclination is to get something that resolves the issue in now and then have the conversation about the larger issue afterward; although, I hesitate to isolate Gemini as this is a larger architecture issue.

That stated, I don't really feel qualified to have an opinion on this one since I'm not familiar with Silex in the first place. Also, while considering this issue I noticed that @whikloj already has a PR on Crayfish commons to pull Gemini into Crayfish Commons which I think reinforces your statement that "Gemini ... is the key to the other services" so I'd be much more interested in his opinion on the topic, as well as @dannylamb since Travis has been appeased for the time being.

whikloj commented 6 years ago

Yeah I needed to get at Gemini for the Recast Service, but that is probably out of date now. I can see that Gemini as the linkage between Drupal and Fedora could be needed from a lot of potential services. My concern with Symfony 4 is that we immediate place a requirement of PHP 7.1.3 or greater, I'm not sure if that is a deal breaker for some.

seth-shaw-unlv commented 6 years ago

It seems all roads lead to killing PHP 5. I think this is the third time in less than 6 months where one of the possible solutions to an issue involved not using PHP 5.

DiegoPino commented 6 years ago

Hi, ok, no worries will make a 2x pulls (symfony 3.2 and 4.0), schema will be the same so once you move to 4.0 and php7.1 db can stay the same. That should address now-inmediate and future

El El mar, 29 de may. de 2018 a las 18:09, Seth Shaw < notifications@github.com> escribió:

It seems all roads lead to killing PHP 5. I think this is the third time in less than 6 months where one of the possible solutions to an issue involved not using PHP 5.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/828#issuecomment-392962933, or mute the thread https://github.com/notifications/unsubscribe-auth/AGn85xEod519P3O2cJd2DlhMI8VIK8Zpks5t3ccxgaJpZM4T6Wn1 .

-- Diego Pino Navarro Digital Repositories Developer Metropolitan New York Library Council (METRO)

dannylamb commented 6 years ago

I definitely am in favor of going with straight symfony for the long term. If given a chance to do it all over again, I think symfony probably would've been the better move from the start (oh hindsight).

I also am in favor of killing 5.6. It's just going to be build issue after build issue as time goes on.

But for now, can we just hash? This issue can be resolved with a lot less work. Switching to symfony seems more like something we'd do in a sprint.

DiegoPino commented 6 years ago

@dannylamb ok, np, thanks for the feedback. Symfony 4 implementation was really straight forward and I was already doing the backport to symfony 3.2 (tiny changes for 5.6 folks), but I can just hash using the silex version, of course, I would have to drop ORM support and auto deployment of the schema to make this just a patch and not full-blown rewrite. Let me finish this stuff today in during the morning and I will share the Symfony rewritten piece, pull a quick fix for silex stuff, and other contributing related actions before the end of the day.

dannylamb commented 6 years ago

I mean, if you already have it done in Symfony...

I'll take what you give me. Just don't want to send you down a rabbit hole if you don't have to.

DiegoPino commented 6 years ago

Just heads up (needs testing, having second thoughts about dateCreated, fields). Silex version (for those 5.6 folks) https://github.com/Islandora-CLAW/Crayfish/pull/46

seth-shaw-unlv commented 6 years ago

I just merged the PR but I neglected to test Postgres first. 👎 The current version's migrate command only works for MySQL.

So, either we need to do a short-term fix and update the claw-playbook to use a valid postgres schema (I'm testing this option now) OR generate a migration for postgres (can you do that, @DiegoPino?) and update claw-playbook to issue the migration:migrate command.

DiegoPino commented 6 years ago

@seth-shaw-unlvI can issue a new pull for Postgres, can you manually test the SQL i wrote there to see if it works on Postgres? Since DBAL is pretty engine dependent, means yeah, you can swap engines in your config, but the moment you start writing SQL queries manually like CRAYFISH did you could end in an issue.

What about this??

https://symfony.com/doc/current/doctrine/multiple_entity_managers.html (see the multiple ones)

This will all be solved once we move to Symfony 4. For now, I can patch it up! (tomorrow sadly, kinda late with my OR2018 slides and 7.x release stuffd)

DiegoPino commented 6 years ago

@seth-shaw-unlv or do you prefer a quicker fix and just do?

if  ('mysql' == $this->connection->getDatabasePlatform()->getName())  {
 $this->DoSQLthatone;
}
elseif 
('postgress' == $this->connection->getDatabasePlatform()->getName())  {
 $this->DoSQLother;
}
seth-shaw-unlv commented 6 years ago

I think I was overreacting with urgency. claw-playbook is tagged against Crayfish 0.0.1, so the recent merge doesn't impact anyone CentOS folks who are building in their own work-arounds at the moment anyway.

So, rather than a quick fix I think we should focus on getting #722 resolved (making sure it works for both mysql and postgres).

DiegoPino commented 6 years ago

See https://github.com/Islandora-CLAW/CLAW/issues/722#issuecomment-393599189