Open Octobug opened 8 months ago
Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔
Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔
@LinkinStars Yeah, uploaded files should be stored as relative paths. And, for those already deployed in production, the fixing should implement a mechanism to automatically update existing data.
@LinkinStars I found that all images in a question, answer or avatar have this issue. I've changed the issue title.
I would like to solve this issue once the solution is settled. 👀
Fwiw, I ran into this issue with logo and favicon as well under the branding settings when originally testing how to go about setting up answer.
Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔
@LinkinStars Yeah, uploaded files should be stored as relative paths. And, for those already deployed in production, the fixing should implement a mechanism to automatically update existing data.
- This could be done by an independent command line tool. or
- During the installation. or
- In the backend code, auto rewrite those stored paths when fetching them.
@Octobug Your suggestion is great. We have carefully discussed this issue for some time and believe that providing a command-line solution would be better.
For example, we can offer a functionality like this:
./answer fix image from A to B
This command can modify the domain name in existing data.
The reasons for this are as follows:
https://s3.east-1.xxx.com/
. Of course, this is just a possibility. By providing this command-line tool, we can assist users in making the switch when they want to change their storage and update this part of the address.Additionally, there is another requirement. We believe it would be helpful to provide an environment variable, ABSOLUTE_IMAGE_PATH
, which users can set to true. In this way, Answer will default to using absolute paths for storing images. This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.
Once again, thank you very much for suggesting this feature because we believe it could pose a little big challenge. What do you think? If there are any unclear points, we can continue the discussion.
@LinkinStars Thank you for your detailed response!
⚠️ I haven't read much of the relevant code yet so I might miss some background knowledge of Answer. I'll make a summary to see if I have understood correctly.
I am also in favor of building this functionality as a ./answer
command option.
And let's split a URL into 2 major parts for the convenience in describing:
PROTOCOL://HOSTNAME:PORT
PATH
BASE_PATH
RELATIVE_PATH
I think the ABSOLUTE_IMAGE_PATH
could be renamed to IMAGE_PATH_PREFIX
(if there will be other types of files in the future, maybe it could be UPLOADS_PATH_PREFIX
).
There are 2 major types of uploaded files:
PROTOCOL://HOSTNAME:PORT + PATH
in DB with no doubt.IMAGE_PATH_PREFIX + RELATIVE_PATH
, only RELATIVE_PATH
is stored in DB. This usage of IMAGE_PATH_PREFIX
can be delivered by documentation:
IMAGE_PATH_PREFIX
is /
IMAGE_PATH_PREFIX
is BASE_PATH
(e.g. new/dirname/
)IMAGE_PATH_PREFIX
is PROTOCOL://HOSTNAME:PORT + BASE_PATH
(e.g. https://statics.self-hosted.com/uploads/
).Pseudocode:
// When fetching existing data to rewrite:
getImageURL(image) {
if (isLocalFile(image)) { // an image has a property to be distinguished. e.g. !startsWith('http')
return IMAGE_PATH_PREFIX + image.path;
} else {
return image.path;
}
}
As for the ./answer fix image from a to b
command, I think the from a
and to b
args are not needed. Because what are stored in DB are always RELATIVE_PATH
s.
Update: for files stored in external places, these args are needed.
- Or, are the two args used for auto-migrating uploaded files (
mv SRC DST
)?- The description above assumes that an admin move uploaded files manually.
Update:
IMAGE_PATH_PREFIX
is used to fix the existing problem: After this problem is fixed, domain changes don't need any extra operations../answer migrate image from a to b
is used to provide a migration feature: When you change external storage type, migration is inevitable.@LinkinStars
This task seems heavy for a Go & Answer beginner (exactly me 🤣). If the coding job is real complicated under your estimation, I think it's better be done by your team.
@Octobug
I think the ABSOLUTE_IMAGE_PATH could be renamed to IMAGE_PATH_PREFIX (if there will be other types of files in the future, maybe it could be UPLOADS_PATH_PREFIX).
I agree. IMAGE_PATH_PREFIX
would be better.
Update:
IMAGE_PATH_PREFIX is used to fix the existing problem: After this problem is fixed, domain changes don't need any extra operations. ./answer migrate image from a to b is used to provide a migration feature: When you change external storage type, migration is inevitable.
That's a good summary. You got it.
This task seems heavy for a Go & Answer beginner (exactly me 🤣). If the coding job is real complicated under your estimation, I think it's better be done by your team.
Take it easy. You can try your best, and we will help you solve any problems together. 💪
Take it easy. You can try your best, and we will help you solve any problems together. 💪
@LinkinStars Okay then, I'll give it another try.
@LinkinStars
I have another thought on external storage type...
If the uploads dataset is huge, the migration could be very time-consuming because it needs to query & filter out all urls of avatars and urls in texts of questions & answers, and then write them back to DB.
What if Answer saves all types of uploads with relative paths with a prefix marker? Like: :NUM:/uploads/post/57pxxvykQFj.jpeg
:NUM:
is a storage url prefix version number. And the version number maps to a URL prefix string, which is used to replace the :NUM:
marker in a relative path stored in DB.
The default value of NUM
is 0
mapping to "/"
, it deals with local uploads type. When an Answer admin decides to move the uploads
directory to another path, the admin just needs to change 0
to map to "/new/path"
after files are settled down.
If the admin adds an external storage, since there is a visit_url_prefix
config field for every storage plugin, Answer knows if it is a new prefix. If it is, it increases NUM
, and set it as active so that any subsequent uploads use the new active NUM
.
Let's say there is a new type of site_info
called uploads
, it looks like:
{
"url_prefix_versions": {
"0": {
// ":0:/uploads/post/57pxxvykQFj.jpeg"
"active": true,
"value": "/"
},
"1": {
// ":1:/uploads/post/57pxxvykQFj.jpeg"
"active": false,
"value": "https://s3.east-1.xxx.com/"
},
"2": {
// ":2:/uploads/post/57pxxvykQFj.jpeg"
"active": true,
"value": "https://abc.oss-cn-hangzhou.aliyuncs.com/"
},
...
},
// other properties...
}
With this solution, those relative paths hardly need to be changed.
The basic migrate operation process is:
NUM
's mapping.And still, ./answer migrate uploads from_a to_b
could be treated as the last resort to rewrite paths in DB. And if the version number solution is adopted, this migrate functionality is better to support both command line and the admin UI. It can provide migration status like progress, affected number of urls, etc.
However, migration of large dataset may not be common. The consideration here is whether it is worth implementing such a feature to address a rare scenario.
If this idea is rejected, I will try to implement the previously discussed solution.
@Octobug First of all, thank you very much for presenting the new proposal. We have had a detailed discussion regarding your proposal. It is certain that using external storage for migration would make it somewhat easier.
However, we still have some considerations:
Therefore, based on the reasons stated above, we would prefer to keep the process simple, and thus we suggest sticking with the previous solution would be better.
@LinkinStars Okay, it was just a thought. 👀
I would like to confirm some details:
Does this var need to be supported both in Config File and Environment Variable?
service_config.upload_url_prefix
UPLOAD_URL_PREFIX
"/"
It actually affects all kinds of uploads, so I think upload
is more proper. And it may include PROTOCOL://HOSTNAME:PORT
, so UPLOAD_URL_PREFIX
is better than UPLOAD_PATH_PREFIX
.
./answer fix upload PREFIX_SRC PREFIX_DST
@LinkinStars Okay, it was just a thought. 👀
I would like to confirm some details:
The env var
Does this var need to be supported both in Config File and Environment Variable?
In config file:
service_config.upload_url_prefix
- Used when site is running.
Environment Variable:
UPLOAD_URL_PREFIX
- Used for installation
- Default value:
"/"
It actually affects all kinds of uploads, so I think
upload
is more proper. And it may includePROTOCOL://HOSTNAME:PORT
, soUPLOAD_URL_PREFIX
is better thanUPLOAD_PATH_PREFIX
.The command
./answer fix upload PREFIX_SRC PREFIX_DST
I can't agree more. 👍 Yes, supported both in config file and env would be better. So UPLOAD_URL_PREFIX
is better than UPLOAD_PATH_PREFIX
.
Additionally, there is another requirement. We believe it would be helpful to provide an environment variable,
ABSOLUTE_IMAGE_PATH
, which users can set to true. In this way, Answer will default to using absolute paths for storing images. This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.
@LinkinStars I made some attempts on the var part. The consideration you mentioned before (I made it bold & italic) seemd hard to achieve: The var (UPLOAD_URL_PREFIX
) is closely associated with service_config.upload_path
.
service_config.upload_path
at the same time:
UPLOAD_URL_PREFIX
doesn't help at all. Because service_config.upload_path
cannot reach to other host.
If using a symlink pointing to the locally moved uploaded files is acceptable (tell admins to do so in the documentation), the var seems to be unnecessary at all. Answer is able to distinguish these two types of uploads without an env var:
@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:
This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.
That would simplify the problem. What do you think?
@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:
- Using relative paths: UPLOAD_URL_PREFIX defaults to "/"
- Using absolute paths with the website domain: UPLOAD_URL_PREFIX is "http://answer.cn:8080/"
- Using absolute paths with a third-party service: UPLOAD_URL_PREFIX is "https://s3.east-1.xxx.com/" (Of course, this scene is already handled by the plugin. Just an example.)
~This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.~ That would simplify the problem. What do you think?
@LinkinStars
Right, there are three scenarios for now. But scenario 2.
seems weird 🤔️. What benefits does storing absolute URLs bring (we don't have disagreement on 3.
)? The only benefit I can think of is that admins can use a reverse proxy server (like Nginx) to route the requests of these absolute URLs to where those uploaded files are (maybe another host different from Answer's host).
@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:
- Using relative paths: UPLOAD_URL_PREFIX defaults to "/"
- Using absolute paths with the website domain: UPLOAD_URL_PREFIX is "http://answer.cn:8080/"
- Using absolute paths with a third-party service: UPLOAD_URL_PREFIX is "https://s3.east-1.xxx.com/" (Of course, this scene is already handled by the plugin. Just an example.)
~This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.~ That would simplify the problem. What do you think?
@LinkinStars
Right, there are three scenarios for now. But scenario
2.
seems weird 🤔️. What benefits does storing absolute URLs bring (we don't have disagreement on3.
)? The only benefit I can think of is that admins can use a reverse proxy server (like Nginx) to route the requests of these absolute URLs to where those uploaded files are (maybe another host different from Answer's host).
@Octobug 👍 What you said is absolutely correct. We are indeed concerned about the existence of such users. As far as I know, there are users who deploy their frontend and backend separately, and as you mentioned, images are accessed through nginx
proxy. Of course, this is only do for compatibility reasons. This is also why it is treated as an environment variable rather than a configuration item placed in the admin page.
@LinkinStars Okay, thank you for your patience! In this case, it seems that your first solution is better. :D That is, make the var a boolean
. Although the name may need to be changed from ABSOLUTE_IMAGE_PATH
to:
ABSOLUTE_UPLOAD_URL
UPLOAD_ABSOLUTE_URL
UPLOAD_USE_ABSOLUTE
(I prefer this one)
UPLOAD_USE_ABSOLUTE
(I prefer this one)
@Octobug 😄 Maybe we can use UPLOAD_USE_ABSOLUTE_URL
. Thanks for your patience as well. This whole discussion has made the whole issue more clear.
UPLOAD_USE_ABSOLUTE
(I prefer this one)@Octobug 😄 Maybe we can use
UPLOAD_USE_ABSOLUTE_URL
. Thanks for your patience as well. This whole discussion has made the whole issue more clear.
@LinkinStars Aha, okay. I'll submit a PR a few days later together with the fix
command.
@LinkinStars Here's my version of the fix
command, I would like to know your opinion. 👀
answer fix upload SRC_PREFIX DST_PREFIX [--continue] [--offline|online] [--dry-run] [--limit NUM]
When the fix command starts, it checks if there is a tmp file: .fix_upload_interrupted
.
--continue
switch to continue.SRC_PREFIX
, it removes .fix_upload_interrupted
then exit.Switches:
--continue
: the execution might be interrutped, some of the data is rewritten and the rest of them isn't. It works with the .fix_upload_interrupted
.
DST_PREFIX
will be ignored.fix upload /uploads/ /uploads/a/
, assume it is interrupted during its running/uploads/a/a/
--offline
: it requires the admin to shut down the Answer site, and it works in batch mode.--online
: it allows the fix to happen when a Answer site is up. It fixes a row of data one at a time in a loop, locks the data row preventing it from being loaded into the web editor, so that the user won't submit the wrong version of links after the fix is done.
fix upload [branding|avatar|question]
--dry-run
: only do the test. It shows what data will be affected, the "before" and "after", with a limit (e.g. first 10 entries).--limit
: fix NUM
rows then exit.Updated at 2024-07-17 22:40
It makes little sense to lock a data row no matter it's an online or offline fix. Because you cannot make sure that there aren't any web UI editors opened with old data loaded in it. Even after an offline fix is finished, there might still be a user going to submit the old version data.
So, neither way is perfect. This corner case can only be addressed by the documentation work I guess.
@Octobug Very thoughtful consideration. I have only one suggestion. I recommend that this command be run offline, but it's not mandatory. It would be advisable to include instructions in the documentation, suggesting users to run it offline. Otherwise, it would require handling various unexpected concurrency situations, such as table locking, which would make the implementation more complex. 🤔️
@Octobug Very thoughtful consideration. I have only one suggestion. I recommend that this command be run offline, but it's not mandatory. It would be advisable to include instructions in the documentation, suggesting users to run it offline. Otherwise, it would require handling various unexpected concurrency situations, such as table locking, which would make the implementation more complex. 🤔️
Okay. I have the feeling that I will give up the --online
switch. :D
Describe the bug
Uploaded images become inaccessible after the domain was changed. I only tested it in development env.
To Reproduce
Steps to reproduce the behavior:
domain.old
, you uploaded your custom avatar.domain.old
todomain.new
, and the uploaded avatar became inaccessible.Expected behavior
Uploaded files should serve well no matter what the domain is.
Screenshots
Platform