My-Little-Forum / mylittleforum

A simple PHP and MySQL based internet forum that displays the messages in classical threaded view (tree structure)
GNU General Public License v3.0
124 stars 48 forks source link

Primary key for table mlf2_banlists #710

Closed auge8472 closed 3 months ago

auge8472 commented 7 months ago

The table mlf2_banlists currently has two columns, but no primary key. The content of one of the two columns, namely name, is unique and therefore suitable as a primary key. Assuming that only the content of the list column is changed if required and further assuming that no new rows are inserted into this table, this should not cause any problems.

Current CREATE-statement:

CREATE TABLE mlf2_banlists (
    name varchar(255) NOT NULL default '',
    list text NOT NULL
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

The default value of the column (an empty string) has to be removed to prevent non unique keys and the definition of the primary key has to be added (similar to the primary key for the table mlf2_settings).

CREATE TABLE mlf2_banlists (
    name varchar(255) NOT NULL,
    list text NOT NULL,
    PRIMARY KEY (name)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

Also altering the table definition in case of updates shouldn't make problems. As a precaution, we can add a check for multiple use of values in the name column before the change.

auge8472 commented 7 months ago

Additionally the size of the column name of 255 Bytes is questionable for me. None of the currently (and from the beginning of the project) existing keys (user_agents, ips, words) is longer as 11 characters/Bytes. I propose to restrict the length of the column name to 16 Bytes. If we were ever to insert a new line (no one else will do that), we would have to pay attention to the length of the value.

In the long run the table in its current structure is questionable at all. A better structure would be to have a column for single values (an IP, a user agent, a word), a non unique column for the type of the blacklisted value (IP, user agent, word) and a numeric primary key. That way the handling of blacklisted $anything would be much easier (IMHO). On the other hand I don't know anyone, who stored anything else than badwords in this table [^1] and the badwords could also be handled by the B8-filter (if enabled!), additionally to create a UI for possibly thousands of single values is not a simple task.

[^1]: To prove me wrong: IPs or user agents anyone?

loesler commented 7 months ago

To prove me wrong: IPs or user agents anyone?

I personally do not use IPs or user agents. Both names no longer seem up-to-date to me either.

auge8472 commented 7 months ago

To prove me wrong: IPs or user agents anyone?

I personally do not use IPs or user agents. Both names no longer seem up-to-date to me either.

To be honest, I found a usecase for excluding a user agent. When one wants to hide the forum content from ChatGPT/AI crawler, one can apply the recommended strings to robots.txt but can also lock out the user agent completly from the forum with the banlist. While the way with the robots.txt is a recommendation for the crawlers, the banlist will simply block the user agent.

In both cases one has to know, how to recognise a specific user agent.

auge8472 commented 7 months ago

And now for something completely different.

I was playing around with cases of multiple use of values in the column name. A possible problem is adding multiple rows under the same name with the matching values in the column list during reimports of (an) existing forum database(s). I tried to put the values of multiple rows with one name into one row with SQL and failed in a first attempt.

I remembered the similar issue when we defined the column name in the settings table as primary key. The main reason to do this was the possibility of multiple settings with the same name but different values in the settings table because of reimports. I will solve this similar to the approach at that time by creating a new table, reading the values from the old table, joining the possibly existing mutliple same-named rows, putting it to the new table, deleting the old table and renaming the new table.

auge8472 commented 3 months ago

Searched a while for a SQL-only solution and came up with GROUP_CONCAT that concatenates the values of a column from different rows into one resulting row. But it has a pitfall. The function concatenates all found rows into one. So I had to split the inserts into three, each for one named list to get three lines in the new table structure, one for each key name.

INSERT INTO `mlf2_banlists`(`name`, `list`)
SELECT `name`, GROUP_CONCAT(`list` SEPARATOR '\n') AS `list` FROM `mlf_banlists_old` WHERE `name` = 'ips';

INSERT INTO `mlf2_banlists`(`name`, `list`)
SELECT `name`, GROUP_CONCAT(`list` SEPARATOR '\n') AS `list` FROM `mlf_banlists_old` WHERE `name` = 'user_agents';

INSERT INTO `mlf2_banlists`(`name`, `list`)
SELECT `name`, GROUP_CONCAT(`list` SEPARATOR '\n') AS `list` FROM `mlf_banlists_old` WHERE `name` = 'words';

Notice: With more than one row of the same name an empty row results in an empty text line with the separator '\n'. With an empty separator '' this would stick the value of the next result line to the last value of the previous result line.

// value of row 1:

foo
bar

// empty value of row 2:

// value of row 3:

this
that

// result with separator '\n':

foo
bar

this
that

// result with separator '':

foo
barthis
that

The empty line, that will be generated with the separator '\n', does not break anything. Additionally, with the next edit of the content, the empty line will be gone.

auge8472 commented 3 months ago

I have to reopen the issue because I solved it in the upgrade script but not in the installation script. This has to be done before releasing verion 2.5.5.