Closed ajs6f closed 6 years ago
Yes, it is. I'll have to dig it up but we've run into this because the stock Centos7 is a MariaDB 5.5 whereas Ubuntu is 5.7. Therein lives the issue. That or the huge possible indexes from our Gemini table schema.
Linking ansible-role-crawyfish issue #2
Ok, so bumping MySQL on the fly should get past this, eh? Trying....
Anyone know if 5.6 solves this, or if I have to go ATW to 5.7?
Looks like it is still a problem with 5.6. HOWEVER, it looks like you can get around the limitation with some configuration flags.
Nice catch, @seth-shaw-unlv ! I am trying with MariaDB 10.0, and if I can't bang it out, I'll try with 10.1.
I think I just hit this because MariaDB 10.0 uses SysV init
on CentOS7, trying their workaround.
Aaaaand, it turns out to actually be easier to just upgrade past 10.1.8, when MariaDB started using systemd
.
Okay, using MariaDB 10.2 and with a fix for a problem with the package I was able to get past this. So far. I'm not sure what to contribute back to CLAW, though-- I guess we have to be honest and say you just can't do this on stock CentOS7? Or should we try to mung the Gemini schema when installing to CentOS (which sounds horrible and unmaintainable)? Or will some kind of new hash-based design (as mentioned in the abovelinked ticket) obviate the need to upgrade?
@ajs6f @DigitLib
It doesn't look like anyone is tackling a Gemini revision to resolve this issue anytime soon and @ajs6f moved forward with a Maria 10.2 install. @DigitLib, would you have any objections to the claw-playbook installing Maria 10.2 by default on CentOS?
I think I have a revised playbook working with Maria 10.2 that I'm putting through a full rebuild right now. I can issue a PR for more community review/discussion once I confirm it works and if you two think this is a good idea.
It sounds reasonable to me. TBC, I may have some trouble selling 10.2 locally (I know, but we're pretty slow-moving here, and often for good reason) but that's my problem, not CLAW's problem.
But I feel like we might be losing sight of an idea (not sure whose, maybe @DiegoPino ?) to move to a hashed form of the URIs (which could let us shrink the column length)? Not saying we should prefer that, just that I don't want to lose that thread because of my site's conservative deployment policies.
TB even more C, definitely let's have the PR ( I will test using CentOS Vagrant).
@seth-shaw-unlv @ajs6f since i did not get any positive feedback (or a postcard or a beer) to do so when I suggested the index alter and hash I though it was not a thing you all wanted. But i can for sure refactor that, i wrote a service that does that (in node.js, rocksdb long story) so can just copy that logic if you all care. Personally, i think refactor is needed even if this all passes, such long indexes are never fast when you reach a few Mill. urls.
It would be nice both to keep the field size shorter (for speed) and also to have a more well-distributed index (because of the hashing).
@seth-shaw-unlv I don't have MariaDB 10 I tried a @DiegoPino scheme https://gist.github.com/DiegoPino/825d33da1a376b72cf0b0c06ff66bed1 and worked OK. I can try to install MariaDB 10...
Arg! Maria 10.2 is still complaining that the field is too long because varchar is 3 bytes per char and 3*2048 is bigger than the new max of 3072. It looks like we need to update Gemini after all.
Hm, that's weird-- Maria 10.2 worked for me.
@ajs6f Can you send me your server config to see if I'm missing something?
Sure, but what exactly do you want-- the MariaDB config?
@ajs6f yes, please. I should have been more clear.
That looks consistent with my test. Hmmm, perhaps there is something deeper I'm missing.
@ajs6f Can you log into your db and run show variables;
and desc gemini.Gemini;
for me?
Oh, WOW. Something is really off here:
MariaDB [(none)]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [(none)]> use gemini;
Database changed
MariaDB [gemini]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [gemini]> show tables;
Empty set (0.00 sec)
And yet the -playbook
install finished successfully and CLAW appears to be running fine.
Yep, further checking reveals that Gemini is not being used, Fedora is not receiving updates (except API-X's setup) long story short, that -playbook
finished without error but without correctly installing. :(
What's more, from the Drupal side, everything appears to be working well. There's no indication that the various CLAW components are not installed/connecting/correct.
@ajs6f @whikloj @seth-shaw-unlv @DigitLib I started refactoring Gemini last night. Will make a pull tonight at some point. I went for total abstraction via doctrine-ORM and should work out of the box for you all. Also, if you are Ok with this, I will move table creation to Gemini, so no more SQL script that runs independently first/DevOps tasks, Gemini will provide its own method for doing so probably will follow with an ansible related pull too (next week)
Lastly, for your own thoughts, since the changes are based on my own Nodejs based mapping/routing service, different thing but similar need, I could leave "space"/plan for the following: A Source Server/Destination Server field. Means, same Drupal could write to multiple Fedora's (round robin?, balance, random? via rules?), Same Fedora could be synced to another Drupal if needed (i speak only about the mapping, not the logic really, CLAW is a different beast than what I have). That means, Gemini in the future would/Could become a router, not only a mapper. Probably worth another issue in the future or out of scope.
Thoughts? hopes?
@DiegoPino Thoughts:
@DiegoPino Goood idea to move SQL scripts to Gemini! Waiting for testing in claw-playbook...
@seth-shaw-unlv Just to confirm, I am manually running mysql Ver 15.1 Distrib 10.2.15-MariaDB
, and the following schema works fine for me (manually):
USE gemini;
CREATE TABLE IF NOT EXISTS Gemini (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
uuid VARCHAR(36) NOT NULL UNIQUE,
drupal VARCHAR(2048) NOT NULL UNIQUE,
fedora VARCHAR(2048) NOT NULL UNIQUE
) ENGINE=InnoDB;
@DiegoPino the doctrine-ORM and table creation via Gemini all sound good to me. As for the enhancements, I'm inclined to keep it simpler for now. We can expand later if we need/want it.
@ajs6f odd. It was still complaining to me when I ran the same CREATE query manually using 10.2. Anyway, it looks like @DiegoPino will have a fix in place making the point moot tomorrow.
@seth-shaw-unlv Yep, this only gets weirder; I did some actual resource-creation and my manually-cobbled-together Gemini definitely works. Resources are getting to the backend and the right paths are being recorded in Gemini.
I agree that we should just roll forward with @DiegoPino.
@ajs6f @seth-shaw-unlv, so, i have this stuff working. I see folks are fighting with composer and stuff so really don't want to drive them crazy right now. When it is right to pull this? Seems like there is some close due date i don't want to interfere with. Anyway, code works.
@DiegoPino why don't you go ahead and issue the PR. Between us and perhaps another committer that isn't fighting with composer we can take care of it, I think. If nothing else it gives @ajs6f and @DigitLib something to target their centos builds against.
Ok, cool! Will deal with this once day job is over. Thanks!
@seth-shaw-unlv @DiegoPino I agree!! Go ahead...
@seth-shaw-unlv @DiegoPino Go for it!
@ajs6f @seth-shaw-unlv @DigitLib @Islandora-CLAW/committers I don't want to go into architecture weirdness but while making the formal pull (cleaning my mess) I just found out that 1) http://symfony.com/blog/the-end-of-silex 😞 and that makes me nervous, maybe I should refactor this totally into Symfony 4 and do a single Gemini repo with no dependencies to the rest of crayfish ecosystem at all. I mean, move one piece at the time and do it full symfony. I feel Gemini here is the key to the other services and maybe should be "upgraded independently". Would love to know your thoughts. It seems also more relevant to the future of this project that just patch it.
if you decided to stay on Silex, to be less rogue (i'm using an independent silex service provider for ORM right now), I could need to A) update crayfish-commons https://github.com/Islandora-CLAW/Crayfish-Commons/blob/master/src/Provider/IslandoraServiceProvider.php or B) simply not use dbal at all in Geminy and make direct use of the ORM service provider without requiring crayfish commons there. Which leads to a question to you folks about why the dependency here and if the other tiny crayfishes need ORM? or want to stay with dbal?
In any case, I can make a few pulls with different options, nothing here is life changing. But please let me know your thoughts. Also, if i'm speaking non-sense too.
My usual inclination is to get something that resolves the issue in now and then have the conversation about the larger issue afterward; although, I hesitate to isolate Gemini as this is a larger architecture issue.
That stated, I don't really feel qualified to have an opinion on this one since I'm not familiar with Silex in the first place. Also, while considering this issue I noticed that @whikloj already has a PR on Crayfish commons to pull Gemini into Crayfish Commons which I think reinforces your statement that "Gemini ... is the key to the other services" so I'd be much more interested in his opinion on the topic, as well as @dannylamb since Travis has been appeased for the time being.
Yeah I needed to get at Gemini for the Recast Service, but that is probably out of date now. I can see that Gemini as the linkage between Drupal and Fedora could be needed from a lot of potential services. My concern with Symfony 4 is that we immediate place a requirement of PHP 7.1.3 or greater, I'm not sure if that is a deal breaker for some.
It seems all roads lead to killing PHP 5. I think this is the third time in less than 6 months where one of the possible solutions to an issue involved not using PHP 5.
Hi, ok, no worries will make a 2x pulls (symfony 3.2 and 4.0), schema will be the same so once you move to 4.0 and php7.1 db can stay the same. That should address now-inmediate and future
El El mar, 29 de may. de 2018 a las 18:09, Seth Shaw < notifications@github.com> escribió:
It seems all roads lead to killing PHP 5. I think this is the third time in less than 6 months where one of the possible solutions to an issue involved not using PHP 5.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/828#issuecomment-392962933, or mute the thread https://github.com/notifications/unsubscribe-auth/AGn85xEod519P3O2cJd2DlhMI8VIK8Zpks5t3ccxgaJpZM4T6Wn1 .
-- Diego Pino Navarro Digital Repositories Developer Metropolitan New York Library Council (METRO)
I definitely am in favor of going with straight symfony for the long term. If given a chance to do it all over again, I think symfony probably would've been the better move from the start (oh hindsight).
I also am in favor of killing 5.6. It's just going to be build issue after build issue as time goes on.
But for now, can we just hash? This issue can be resolved with a lot less work. Switching to symfony seems more like something we'd do in a sprint.
@dannylamb ok, np, thanks for the feedback. Symfony 4 implementation was really straight forward and I was already doing the backport to symfony 3.2 (tiny changes for 5.6 folks), but I can just hash using the silex version, of course, I would have to drop ORM support and auto deployment of the schema to make this just a patch and not full-blown rewrite. Let me finish this stuff today in during the morning and I will share the Symfony rewritten piece, pull a quick fix for silex stuff, and other contributing related actions before the end of the day.
I mean, if you already have it done in Symfony...
I'll take what you give me. Just don't want to send you down a rabbit hole if you don't have to.
Just heads up (needs testing, having second thoughts about dateCreated, fields). Silex version (for those 5.6 folks) https://github.com/Islandora-CLAW/Crayfish/pull/46
I just merged the PR but I neglected to test Postgres first. 👎 The current version's migrate command only works for MySQL.
So, either we need to do a short-term fix and update the claw-playbook to use a valid postgres schema (I'm testing this option now) OR generate a migration for postgres (can you do that, @DiegoPino?) and update claw-playbook to issue the migration:migrate command.
@seth-shaw-unlvI can issue a new pull for Postgres, can you manually test the SQL i wrote there to see if it works on Postgres? Since DBAL is pretty engine dependent, means yeah, you can swap engines in your config, but the moment you start writing SQL queries manually like CRAYFISH did you could end in an issue.
What about this??
https://symfony.com/doc/current/doctrine/multiple_entity_managers.html (see the multiple ones)
This will all be solved once we move to Symfony 4. For now, I can patch it up! (tomorrow sadly, kinda late with my OR2018 slides and 7.x release stuffd)
@seth-shaw-unlv or do you prefer a quicker fix and just do?
if ('mysql' == $this->connection->getDatabasePlatform()->getName()) {
$this->DoSQLthatone;
}
elseif
('postgress' == $this->connection->getDatabasePlatform()->getName()) {
$this->DoSQLother;
}
I think I was overreacting with urgency. claw-playbook is tagged against Crayfish 0.0.1, so the recent merge doesn't impact anyone CentOS folks who are building in their own work-arounds at the moment anyway.
So, rather than a quick fix I think we should focus on getting #722 resolved (making sure it works for both mysql and postgres).
I'm going to check into whether this is a problem with CentOS7's stock version of MySQL, but I'm trying with PostgreSQL for the moment.