Geeklog-Core / geeklog

Geeklog - The Secure CMS.
https://www.geeklog.net
24 stars 19 forks source link

Incorrect Table Collation on new table when Geeklog Upgraded #1025

Closed eSilverStrike closed 4 years ago

eSilverStrike commented 4 years ago

Collation of database is utf8mb4_general_ci (along with all the other tables) but the 2.2.0 to 2.2.1 Geeklog Upgrade created the Likes table with a collation of utf8_general_ci. It happened on Geeklog.net and one of my other sites.

eSilverStrike commented 4 years ago

Looking further into this I see the reason why it happened on Geeklog.net was that $_DB_charset was empty so it picked UTF-8 during the upgrade process so that was why the table was in a different collation. Still it did not write the collation back to the dbconfig.php at the end of the upgrade so $_DB_charset remained blank.

On my other site I had the old $_DB_charset with utf8 in it commented out and a new line below it like this:

//$_DB_charset = 'utf8';
$_DB_charset = 'utf8mb4'; 

That some how seemed to confuse the installer and it created the new table for the upgrade with the utf8_general_ci collation. Removing the commented out line and rerunning the upgrade (actually I did a migration in all cases) resulted in the new table with the correct collation utf8mb4_general_ci

mystralkk commented 4 years ago

I think I fixed this bug with change set d37c3b6.

eSilverStrike commented 4 years ago

@mystralkk - Done some testing for new installs and things look good. Ran out of time and I still have to test upgrade and migration if $_DB_charset is empty.

With the last commit I updated the install and added a link for the help page which includes more information on the character sets and database collation stuff. I also updated the Geeklog English Docs with something similar. I also added a link to the help page by the 'Use UTF-8' setting for new installs.

Can you read the information I added about the character sets and database collations and make sure my assumptions are correct and fix any mistakes? I tried to explain things as clearly as possible. This includes:

I was unsure of some of the required collations for PGSQL for the tables. Can you fill those in? I also didn't update the Japanese install docs...

I hope to get to more install testing tomorrow.

eSilverStrike commented 4 years ago

@mystralkk - Okay spent a few more hours testing things and things look good except for 1 issue. Here is what I tested multiple times:

New INSTALL

Upgrade and Migrate

The only issue I see is if $_DB_charset starts off as '' for an upgrade. Everything else seems fine but $_DB_charset doesn't get updated in dbconfig. Should this be fixed in the install class writeconfig function or somewhere earlier?

Also any reason we switched the english_utf8 file to $LANG_CHARSET = 'iso-8859-1'? We can keep it utf-8 and update the _list file and in change language show both versions of English. I tried it and the install works fine using either english language file.

mystralkk commented 4 years ago

The only issue I see is if $_DB_charset starts off as '' for an upgrade. Everything else seems fine but $_DB_charset doesn't get updated in dbconfig. Should this be fixed in the install class writeconfig function or somewhere earlier?

When $_DB_charset is empty, the default charset for the database server is used. In this case, we don't know which charset the user is using, so I don't think it is good to fill $_DB_charset while upgrading.

Also any reason we switched the english_utf8 file to $LANG_CHARSET = 'iso-8859-1'? We can keep it utf-8 and update the _list file and in change language show both versions of English. I tried it and the install works fine using either english language file.

No reason for this.

eSilverStrike commented 4 years ago

When $_DB_charset is empty, the default charset for the database server is used

Ah I didn't realize that... I should add that to the comment in dbconfig.php (I wondered what the GeeklogCMS did when it was blank

I thought of 2 things we could do, if empty we could map the language character set to the db character set using the logic from the $databaseCharsets function. (obviously not perfect as we assume that the database collation would be correct)

OR

I can perform a check when upgrading from Geeklog 2.2.0 to 2.2.1 that if $_DB_charset is empty it would warn the user on the upgrade check screen and tell him to update it with the correct character set and would not let him continue until he had modified dbconfig.php. All the code exists already for this and I would just have to add a few more language fields

mystralkk commented 4 years ago

In my opinion, the second option is better and much safer.

eSilverStrike commented 4 years ago

@mystralkk - Okay I implemented the second option. I still plan to do a bit more testing on this but I am hoping this is it for changes of the install. I updated the install docs text and the help language in the install. I didn't touch related Japanese if it was already translated. You may want to read it over and see if any more needs to be added to the translation.

Once I finish my testing I will close this issue.

eSilverStrike commented 4 years ago

Fix for migration using install language character set instead of site config language character set to figure out database character set if missing.

Also updated migration to now require database character set in dbconfig similar to upgrades (now checks for all future Geeklog upgrades). Did this because if migration for sites that use a language that is not supported by the install and if $_DB_charset is empty then migration will use an incorrect db character set.

Also fixed variable not set error for writeconfig function check, for PHP version check, and file upload error pages

Made PHP version check, file upload error pages, and DB Character Set error pages use the same page layout.