ansani / Shareaza

Shareaza is a peer-to-peer client for Windows that allows you to download any file-type found on several popular P2P networks.
http://shareaza.sf.net
26 stars 4 forks source link

Sparse writing does not work as it should on NTFS volumes #86

Open abolibibelot1980 opened 1 year ago

abolibibelot1980 commented 1 year ago

As of version 2.7.10.2, sparse writing does not work as it should on NTFS volumes. Shareaza does have an option to set a threshold for sparse writing, which is set to 8MB by default, and according to the description it should do exactly what it says : define a file size threshold beyond which files should be written as “sparse”, and therefore not get their complete size allocated right away (or at least the complete length up until the last received chunk relative to the beginning of the file). Yet that doesn't happen, the full size gets allocated right away and it can take up an enormous amount of unnecessary space when downloading many large files with few and/or seldom seen sources. By contrast, eMule deals with “sparse” writing flawlessly, despite its older design. (Viewing which files are “sparse” and which are not can be tricky on Windows since that attribute doesn't even appear in a file's properties, but it does appear as the letter “P” if you enable the “Attributes” column, at least that works on Windows 7.)

Also Shareaza doesn't seem able to anticipate that the currently available space on the receiving volume won't allow to download a given file or a given chunk (based on this inability to use the “sparse” feature), and sometimes mindlessly continues to receive new chunks until free space goes down to nearly 0, then (and only then) pauses every single active download, even those which have 0 source and are therefore not liable to cause any trouble. Sometimes when a file is being downloaded when the available space gets too low, some chunks end up corrupted, and then it mindlessly rejects the source which transmitted those chunks identified as corrupted (for no other reason than the insufficient space) as if they were the culprit ! And so one has to make unnecessary actions to resume that download through an inordinately convoluted process (usually it involves using advanced features like “assume file is 100% complete and reverify”, then resuming, sometimes re-adding the specific source if there is only one and it's not re-added automatically... that kind of aggravation which could and should be prevented by design...). I may have to create a specific bug report for that second part, but I figured that it was sufficiently linked with the main subject of this one to add it here.

ansani commented 1 year ago

Hi! You don't need to create a new BUG for the second issue. I will create it and link the referenced source code.

ansani commented 1 year ago

Hi @abolibibelot1980 ! I did some tests with last dev release of the client (2.7.10.4 beta6) but I cannot replicate the issue.

As you see in this screenshot

image

The sparse allocated size is 129MB but on disk only 1.18MB are really present.

This is also confirmed on the code:

image

Can you provide more details to replicate the issue?

Thank you!

abolibibelot1980 commented 1 year ago

Well, I am currently using the last official version, 2.7.10.2, on a Windows 7 64b system... The “Download.SparseThreshold” option is set to the default value 8MB. And yet all files downloaded with Shareaza are not “sparse”, regardless of their size. Beyond that, what kind of extra details could I provide ?

Speaking of which, information about “sparse” files on Windows is very “sparse” itself, and I know very few standalone tools which can turn a non-sparse file into a sparse file. In fact I know only one which does so without modifying the data: a command line tool called SparseTest, qualified as a “proof-of-concept” demo tool, which I saw mentioned in an old forum thread about an eMule “mod” and downloaded from a now defunct website through web.archive.org. I've used it ever since, it usually works very well (it performs a MD4 hash computation before and after the processing so as to ensure that the data integrity is preserved), except I found a bug with files having a size which is an exact multiple of 1048576, which are not processed at all. The native multi-purpose Windows tool fsutil can set the “sparse” flag (fsutil sparse setflag) but it affects only future write operations, and it can de-allocate a specified chunk of data (fsutil sparse setrange), but it makes that chunk empty regardless of what it contained, it does not allow to scan a file for empty clusters so as to reduce its allocated size without modifying its contents. So it would be a nice extra feature if Shareaza could “sparse-ify” local files, or imported partial files.

I will test your latest release ASAP, right now I have downloads in progress, and I need to free up some space to do a full backup of the AppData directory, in case something goes wrong. Should I simply replace the main executable, or do I need to do a full reinstall ?

ansani commented 1 year ago

Can you try my latest release?

abolibibelot1980 commented 1 year ago

So I'm in the process of attempting to compile your last release — something I've not done so often so I'm not completely lost but I'm not completely comfortable either. First, when you wrote “Team Edition”, did you mean “Community Edition” ? (The three choices are: “Community 2022”, “Professional 2022”, “Enterprise 2022”.) Then, reading the explanations I found this: “If you want to compile to 64-bit, make sure to enable it during the install process under Add or Remove Features (Language Tools -> Visual C++ -> X64 Compilers and Tools).” It may be a mistake as “Language tools” should be, well, language tools... The most likely category should be “Individual components”, right ? Then, in the very poor french translation of that very long list of components, I have no clue where to find the corresponding item (there are two categories named “Compilateurs, outils de génération et runtimes”, perhaps it's one of those ? well, maybe I'm completely lost after all...), and I sure don't know what else would be necessary for the damn thing to work as expected. I could always check everything, but the total size could be humongous, and I'm constantly struggling to find enough free space on every single volume on this damn computer — as a matter of fact, the total size before I check anything is already 2.28GB (most likely compressed), while the available free space on my system partition is 4.53GB (and quite often it drops down to nearly nothing for no sound reason whatsoever).

ansani commented 1 year ago

You can simply download the release for your system (64-bit for example) and install the client. Release link: https://github.com/ansani/Shareaza/releases/tag/v2.7.10.4-beta6

abolibibelot1980 commented 1 year ago

So, since I installed the above linked release (which by the way has a warning about it being an alpha release yet it is designated as beta here, which is a tad confusing, although I'm not sure what this entails exactly), all .partial files corresponding to newly added downloads do have the “P” = “sparse” attribute, regardless of the file's expected complete size (with v. 2.7.10.2 none of them had that attribute). I have yet to see how it behaves when downloading large files non-sequentially, but logically whenever that attribute / flag is defined, all future write operations should be made in “sparse” mode. But then what about that 8MB threshold ? Is it possible to perform non-sparse write operations to a sparse file ? Or perhaps Shareaza zero-fills the empty blocks if it has to write beyond so as to allocate them right away ? What's strange is that you weren't aware of that issue, and therefore most likely did not modify the relevant parts of the code, yet it has somehow been fixed at some point between v. 2.7.10.2 and the current 2.7.10.4 pre-release. Has anybody else worked on it before you got involved ?

EDIT : By the way, I'm not sure if it's worth doing a dedicated report for this, but I find it quite annoying that the creation date of all .sd files is constantly updated (in addition to the modified and access dates which are expected to change over time) and does not reflect the time the corresponding downloads were added. It is especially weird since I'm aware of a little known Windows quirk called the “creation date tunneling effect” which was designed to prevent just that — whenever a file is written then renamed with the exact same name of a file which has been deleted shortly before (less than 15 seconds by default), the creation date of the deleted file is normally transferred to the newly written file (I had written a quite detailed post mentioning this in that forum thread). I'm curious as to why it doesn't happen in the case of Shareaza's .sd metadata files.

ansani commented 1 year ago

Hi @abolibibelot1980 ! I know that the current release cycle (alpha/beta/prod) is a bit confusing (I will try to manage the cycle definitely with the new year). Your assumption is right! Even with a really large file, the SPARSE logic will remain "safe" for every write. About the fix, I think that the issue was related to the Windows SDK linked to the original release. I switched and updated all dependencies to the last updated ones (this mean, better performances on new systems, and better usage of RAMS and CPU cycles).

abolibibelot1980 commented 1 year ago

What do you mean precisely by “safe sparse logic” ? Thanks for the (most likely) explanation.

(I added an edit above, regarding .sd files' creation dates, in case you missed it since you replied before I sent it.)

ansani commented 1 year ago

Hi! Here https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_set_sparse you will find all the info for the logic behind the SPARSE approach made by Microsoft.

(BTW: I read the note about the .sd file. I will check sources and I will follow-up)