club-1 / flarum-ext-cross-references

Add cross reference links when a discussion is mentioned from another one.
GNU Affero General Public License v3.0
6 stars 1 forks source link

Console command to reparse all comments and convert them to use `CROSSREFERENCE*` tags #34

Closed n-peugnet closed 1 year ago

n-peugnet commented 1 year ago

This new command would allow to remove the "Retrofit" part of the JS code:

https://github.com/club-1/flarum-ext-cross-references/blob/152437db44471e500930d9f570f9cbed58ea7ae0/js/src/forum/index.ts#L57-L62

Questions

Should this comment create back-reference at the same time?

I don't think so. This will add the event post at the end of the target discussion and with an incorrect timestamp which would be strange. Also the JS retrofit was not doing it either, so the new behaviour would be the same as previously.

Important

This command should ask for confirmation and emit a warning about the fact that links converted by this command will not be clickable if the extension is disabled or removed.

n-peugnet commented 1 year ago

@rob006: as the need for reparsing the messages also arose in another extension I made, I started thinking about making another extension that would provide some useful general maintenance commands like this one. What do you think about this idea?

rob006 commented 1 year ago

What do you think about this idea?

I like it. It definitely makes more sense to have generic tools in separate extension.

n-peugnet commented 1 year ago

I just made a quick implementation of this command in https://github.com/club-1/flarum-ext-chore-commands.

It is not yet very well tested but I already corrected some mistakes I made along the way. I do absolutely not recommend to run it in production yet, but I would be interested if you could make some testing on your end.

n-peugnet commented 1 year ago

I did a little bit more testing with the database of our (very) small community (322 comment posts) and it behaved properly. As the list is chunked, it should be ready for forums a lot more massive than that. It could maybe be optimized later by chunking the updates too instead of calling save() for each post one by one, but I don't really know how much of a performance boost this will achieve.

rob006 commented 1 year ago

I tested this on my forum (5.7k posts) and I found only one issue: mentions extension throws an error for some posts (I believe this is related to mentioning deleted users or processing post of deleted user), and this (silently - there are not errors in console, only in error log) interrupts processing:

TypeError: Argument 2 passed to Flarum\Mentions\ConfigureMentions::addPostId() must be an instance of Flarum\User\User, null given in .../vendor/flarum/mentions/src/ConfigureMentions.php:127

I was able to handle this with this patch:

Index: src/Console/ReparseCommand.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/Console/ReparseCommand.php b/src/Console/ReparseCommand.php
--- a/src/Console/ReparseCommand.php    
+++ b/src/Console/ReparseCommand.php    (date 1683115804681)
@@ -80,9 +80,14 @@
         $changed = 0;
         foreach ($posts as $post) {
             assert($post instanceof Post);
-            $src = $this->formatter->unparse($post->content, $post);
-            $user = is_null($post->editedUser) ? $post->user : $post->editedUser;
-            $content = $this->formatter->parse($src, $post, $user);
+            try {
+                $src = $this->formatter->unparse($post->content, $post);
+                $user = is_null($post->editedUser) ? $post->user : $post->editedUser;
+                $content = $this->formatter->parse($src, $post, $user);
+            } catch (\Throwable $exception) {
+                $io->warning("Error occurred while processing post with ID $post->id - reparsing was skipped:\n$exception\n");
+                continue;
+            }
             if ($post->content != $content) {
                 $post->content = $content;
                 $post->save();

It does not work perfectly with progress bar, so it could be improved (for example collect errors and print them after all posts were processed), but I think that this command should expect exceptions here and deal with them in some way. I believe that original issue in flarum/mentions was fixed as side effect of https://github.com/flarum/framework/pull/3808, but other extensions may also fail while processing old posts.

It could maybe be optimized later by chunking the updates too instead of calling save() for each post one by one, but I don't really know how much of a performance boost this will achieve.

I don't think it is worth it - command was quite fast in my case, and batch update is quite complicated and brings its own problems (deadlocks). Command already saves only posts with altered content, so update query is skipped most of posts anyway.

n-peugnet commented 1 year ago

I tested this on my forum (5.7k posts) and I found only one issue: mentions extension throws an error for some posts (I believe this is related to mentioning deleted users or processing post of deleted user), and this (silently - there are not errors in console, only in error log) interrupts processing:

TypeError: Argument 2 passed to Flarum\Mentions\ConfigureMentions::addPostId() must be an instance of Flarum\User\User, null given in .../vendor/flarum/mentions/src/ConfigureMentions.php:127

Ah yes, of course... didn't think of that despite being confronted to this issue myself in cross-reference extension.

I was able to handle this with this patch:

Index: src/Console/ReparseCommand.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/Console/ReparseCommand.php b/src/Console/ReparseCommand.php
--- a/src/Console/ReparseCommand.php  
+++ b/src/Console/ReparseCommand.php  (date 1683115804681)
@@ -80,9 +80,14 @@
         $changed = 0;
         foreach ($posts as $post) {
             assert($post instanceof Post);
-            $src = $this->formatter->unparse($post->content, $post);
-            $user = is_null($post->editedUser) ? $post->user : $post->editedUser;
-            $content = $this->formatter->parse($src, $post, $user);
+            try {
+                $src = $this->formatter->unparse($post->content, $post);
+                $user = is_null($post->editedUser) ? $post->user : $post->editedUser;
+                $content = $this->formatter->parse($src, $post, $user);
+            } catch (\Throwable $exception) {
+                $io->warning("Error occurred while processing post with ID $post->id - reparsing was skipped:\n$exception\n");
+                continue;
+            }
             if ($post->content != $content) {
                 $post->content = $content;
                 $post->save();

It does not work perfectly with progress bar, so it could be improved (for example collect errors and print them after all posts were processed), but I think that this command should expect exceptions here and deal with them in some way.

I also think that every exception should be caught in this block, I will apply your patch. EDIT: done: https://github.com/club-1/flarum-ext-chore-commands/commit/a5bf4dce44d54853e69b205ba629511ce257e21f

I believe that original issue in flarum/mentions was fixed as side effect of flarum/framework#3808, but other extensions may also fail while processing old posts.

What do you think about using GuestUser as a fallback for the actor, in addition to catching exceptions? This could prevent some extensions from crashing, but it would not reflect the real state of the post.

It could maybe be optimized later by chunking the updates too instead of calling save() for each post one by one, but I don't really know how much of a performance boost this will achieve.

I don't think it is worth it - command was quite fast in my case, and batch update is quite complicated and brings its own problems (deadlocks). Command already saves only posts with altered content, so update query is skipped most of posts anyway.

Yes, I added the update skipping after this comment, and arrived at the same conclusion. I tried to implement it nonetheless out of curiosity and could not measure any speed-up.

rob006 commented 1 year ago

What do you think about using GuestUser as a fallback for the actor, in addition to catching exceptions? This could prevent some extensions from crashing, but it would not reflect the real state of the post.

I'm not sure, it would need some investigation how Flarum handles this scenario (as an admin I'm able to edit posts of deleted users and quote these posts). But skipping such posts completely looks like a safer approach - if extension uses actor to handle permissions or detect context, then reparsing with guest as an actor may break some features.

n-peugnet commented 1 year ago

Thank you for your input, I agree that skipping the update seems like a safer approach.

Meanwhile, I made the message less verbose and added a test for this case: https://github.com/club-1/flarum-ext-chore-commands/compare/a5bf4dce44d54853e69b205ba629511ce257e21f..5edf5e868f54e23f945e244d6fa937484985dba4

rob006 commented 1 year ago

$exception->getMessage() may be pretty cryptic without a stack trace. If you want to reduce noise in console, I would rather log these exceptions to separate log file and display summary at the end of processing, as something like "Reparsing failed for X posts, see detailed logs in /path/to/storage/reparse-2023-05-03_112233.log".

n-peugnet commented 1 year ago

$exception->getMessage() may be pretty cryptic without a stack trace. If you want to reduce noise in console, I would rather log these exceptions to separate log file and display summary at the end of processing, as something like "Reparsing failed for X posts, see detailed logs in /path/to/storage/reparse-2023-05-03_112233.log".

I just released version 1.0.0 of this extension with this comment addressed. It also displays the live counts of changed and skipped posts next to the progress bar.