gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
14.83k stars 971 forks source link

Annotation tool saves modified page in the wrong directory #1280

Closed mikkovedru closed 10 months ago

mikkovedru commented 11 months ago

Describe the bug Annotation tool saves modified page in the wrong directory

To Reproduce Steps to reproduce the behavior:

  1. Make template "WWW-INBOX/{visit-date-iso} ({url-hostname}) {page-title}.html"
  2. Save page
  3. Open the saved page in a browser with the annotation tool
  4. Make highlights
  5. Save the page
  6. (BUG) The file will not be saved in ~/Downloads/WWW-INBOX/ but in ~/Downloads/.

Expected behavior The modified file should be saved in the same directory as the opened file.

Environment

gildas-lormeau commented 11 months ago

Thank you for the suggestion. Unfortunately, this feature is almost impossible to implement in a reliable way.

To implement this feature, the editor would need to interpret again the filename template with the page data. The problem is that some data is lost. For example, it's impossible to determine the value of {visit-date-iso} when editing a saved page. It also applies to some other variables like {referrer}. The real issue, from my point of view, is the lack of filesystem APIs in extensions that allowing for example to overwrite the saved page. This is because browser vendors refuse to implement APIs that would allow extensions to work easily with the filesystem, for security reasons, which is somewhat understandable, as the good old hierarchical filesystem is a bit of an old-fashioned and overly permissive technology.

mikkovedru commented 11 months ago

I don't know anything about this thing, so I will just throw some general ideas in the air without knowing how good/bad they are.

  1. But how does SingleFile/Firefox know to save the new file using exactly the same file name (with the only problem being that it is in the wrong directory)? How do they know to append (number) in case the file with the same name already exists? Where is this information stored (by whom) and what is the difference (file name vs. dir name), that restricts usage in the latter case?
  2. How about saving all the useful page data inside the html file, which could then be used to interpret the filename template?
    • This was actually an issue that I planned to raise separately, but now that it naturally came about, I decided to write it here. The problem I have been having is having all the relevant page info in one file. That's the whole problem SingleFile solves. But if some of the relevant and critical information (like the original URL) is not in the file, it means that I would need both to spend time doing extra stuff and to save that information in some other place like another .md file. Not nice. What is nice is to have yt-dlp type of organization in which I can name my file anyhow I want, but I can open the video file in MediaInfo and I will see all the relevant information like URL or Description. Amazing! image
    • And it would be nice to have the SingleFile style of solving this problem. I don't have much experience yet with SingleFile, but from my limited experience, I have noticed that even with such a basic thing as the original URL the variable name varies from site to site. Sometimes it's og:url, sometimes twitter:app:url:googleplay, sometimes apple-itunes-app with the content app-id=663592361, app-argument=https://duckduckgo.com/?q=singlefile+%22visit+date%22+vs+%22save+date%22&t=lm&smartbanner=1, etc. And sometimes there will be no information at all! And we are talking about the most critical info - the original URL, which is a must for the purpose SingleFile is being used (collect information in case the original URL will break, but need to have the original URL in order to see if the link was broken).
    • Let alone other important variables that can be used in SingleFile template, but will then be gone and forever lost in the wind. So it would be very nice to have all those variables saved as part of the web page info (can have SingleFile- prefix)
    • There aren't that many variables so the file size increase would be negligible.
    • What's extra cool is that one could change the settings and all the following editing/saving will automatically work!
    • The privacy considerations: settings toggle that would disable saving some of the potentially sensitive data like {referrer}. If they are missing, then use MIGRATION_DEFAULT_VARIABLES_VALUES. But in general, I don't think that it is a problem at all (and I am very privacy-minded) and that this feature is necessarily even needed.
gildas-lormeau commented 10 months ago

I implemented the proposition 2. It's a basic implementation which embeds only the data used to determine the value of the filename from the template. The feature will be disabled by default. You'll have to enable the option File name > save the filename template data into the page option to enable it. It will work with the annotation editor but also if you save again an archive opened from the filesystem. The feature will be available in the next version.

For other comments not directly related to the current issue, please create another issue(s) summarizing your ideas.

mikkovedru commented 10 months ago

This is amazing! Thank you so much, Gildas! :1st_place_medal: