Closed ms52538 closed 10 months ago
The issue at step (7) (multiple files with same hash) should have been fixed in a recent 2401 beta via https://github.com/BiglySoftware/BiglyBT/commit/7820477f268863820991c77ba7655858116d414e )
Are the files that are missed smaller than twice the piece size of the torrent? There are two phases in the matching process - for files that contain at least one piece (hash checking between files can only be done in piece-sized chunks). Once those matches are identified BiglyBT tries to find a common 'root' folder the the match results and then performs exact name based matches on the remaining files.
The log from the SfEDF window might provide clues.
I just performed Scenario 1: Here are the log entries pertinent to the issue of 2 files being missed in matching when performed by a SfEDF event: Found 33 files with 33 distinct sizes (Neither of the 2 missing files appear in the search, whereas 31 other files do [file name, testing, linking entries] (the other 2 do not have such entries). Linked 0 of 2
Of the 2 that are missed by the search, in the BT "Size" field, they show their uncompleted 'stub files' to be (as they are unmatched and have not been downloaded into the torrent's folder [aka the torrent is at roughly 99.5% complete]: File #1 is 746.1kB, and File #2 is 484.5 kB
The 'completed' correct Files, as previously backed up to another folder, as reported in Windows Explorer: File #1 is 14,494 kB, and File #2 is 41,037 kB
in BT, under the uncompleted torrent, BT is reporting nothing in the Pieces Tab, and it is reflecting 3 empty boxes within the PieceMap (all others are solid blue).
I'll perform a separate group of tests to validate the duplicate hash issue, possibly today, I suspect it may still not work because I was having issues with a torrent this past Monday.
Thoughts?
So the missing files appear likely to be too small to be hash checked. Towards the end of the SfEDF log there should be things generated by
logLine( viewer, dm_indent, "Matched=" + actions_established.size() + ", complete=" + already_complete + ", ignored as not selected for download=" + skipped + ", no candidates=" + no_candidates + ", remaining=" + unmatched_files.size() + " (total=" + files.length + ")");
logLine( viewer, dm_indent, "Looking for other potential name-based matches" );
be interesting to know the log from that point onwards
5/21/20 9:23 AM: Enumerating files in P:\HOBBIES\Painting\PaintByNumbers!RENAMED VIDEOS (Backup) Found 33 files with 33 distinct sizes Processing 'PaintByNumbers-Beginner', piece size=512.0 kB Matched=0, complete=41, ignored as not selected for download=0, no candidates=3, remaining=2 (total=46) 5/21/20 9:23 AM: Complete, downloads updated=0
Note: this torrent contains pics but those are located in a sub-directory of their own in the torrent. I'm just focusing on Video Files. (SfEDF does not seem able to identity pre-existing image files, btw. Why is that?)
On Thu, May 21, 2020 at 9:10 AM parg notifications@github.com wrote:
So the missing files appear likely to be too small to be hash checked. Towards the end of the SfEDF log there should be things generated by
logLine( viewer, dm_indent, "Matched=" + actions_established.size() + ", complete=" + already_complete + ", ignored as not selected for download=" + skipped + ", no candidates=" + no_candidates + ", remaining=" + unmatched_files.size() + " (total=" + files.length + ")");
logLine( viewer, dm_indent, "Looking for other potential name-based matches" );
be interesting to know the log from that point onwards
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BiglySoftware/BiglyBT/issues/1651#issuecomment-632076951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5IYZAVF4AXQNQWWZU5OPTRSUR5NANCNFSM4NGFG6NQ .
I exported the torrent into XML, and found these entries for you, if
relevant:
-
-
-
-
On Thu, May 21, 2020 at 9:28 AM Mark Alan ms52538@gmail.com wrote:
5/21/20 9:23 AM: Enumerating files in P:\HOBBIES\Painting\PaintByNumbers!RENAMED VIDEOS (Backup) Found 33 files with 33 distinct sizes Processing 'PaintByNumbers-Beginner', piece size=512.0 kB Matched=0, complete=41, ignored as not selected for download=0, no candidates=3, remaining=2 (total=46) 5/21/20 9:23 AM: Complete, downloads updated=0
Note: this torrent contains pics but those are located in a sub-directory of their own in the torrent. I'm just focusing on Video Files. (SfEDF does not seem able to identity pre-existing image files, btw. Why is that?)
On Thu, May 21, 2020 at 9:10 AM parg notifications@github.com wrote:
So the missing files appear likely to be too small to be hash checked. Towards the end of the SfEDF log there should be things generated by
logLine( viewer, dm_indent, "Matched=" + actions_established.size() + ", complete=" + already_complete + ", ignored as not selected for download=" + skipped + ", no candidates=" + no_candidates + ", remaining=" + unmatched_files.size() + " (total=" + files.length + ")");
logLine( viewer, dm_indent, "Looking for other potential name-based matches" );
be interesting to know the log from that point onwards
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BiglySoftware/BiglyBT/issues/1651#issuecomment-632076951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5IYZAVF4AXQNQWWZU5OPTRSUR5NANCNFSM4NGFG6NQ .
3 files in the torrent had a file length that didn't occur in any files in the selected folder (hence 'no candidates=3).
2 files did have a file (or more) with the same length but failed to be matched (probably because they were too short)
Unfortunately if no matches occur at all during the SfEDF operation (as in this case) then the subsequent search based on name doesn't occur as a common root folder (from the matching process) can't be found (as there were no matches)
That leaves me confused. BT can work with the Torrent, download it, complete the file structure and populate the files. But if rename the entire directory for a moment in Windows Explorer, then within BT I delete the torrent and files so everything is cleaned up, then re-download the torrent again but this time in a STOPPED state and I perform a SfEDF and point it at the directory that was renamed, BT cannot find all the files, but some larger percentage) it had already downloaded? Essentially that is this scenario. Yes? I am confused as to why it can find some, but not all. Sorry, I'm just trying to make sense of what I'm reading.
There are files that are too short to be matched by checking their content against other files (if a torrent's piece size is, say, 1MB, then a file has to be at least 1MB in size for an attempt to be made to see if it is the same file or not)
Files smaller than this can only be matched by looking at their name.
Considering the file name might be 'sample.png' and that there may be many 'sample.png' files scattered through the potentially huge file hierarchy that is being searched, the name matching process only kicks off it already matched files (in this matching run) have a common root location.
For example, files A and B have been fixed up from location
x/y/z/A x/y/z/B
the common root here is deduced to be "x/y/z" and name matching only be attempted relative to that root. If the CURRENT matching process has identified no matches then the root can't be deduced.
As per your scenario - please reproduce it and send me the entire log from the SfEDF window (email to paul@biglybt.com if you want)
That adds clarity and logic, so I'm processing it all. While the SfEDF function is legit, it is problematic that it cannot find pre-existing files that do exist, because of their size. As I have read other posts, people use the function as a legit means to find pre-existing content on their drives that may have been downloaded. Question: is there a way to pre-identify which files within the torrent would 'fail' IF a SfEDF function were to be performed? Because it sounds like any file smaller than 1MB would be at risk of failing.
I know we've chatted in the past about my renaming efforts and having to use an external app to perform that function where 'serious horsepower' is needed - and you coding the ability to 'batch rename' with a pop-up window that I can then copy to a text file, rename the files, copy back over to the pop-up window and apply. BT is not trying to be a bulk-renaming application. But in all seriousness, this is a sort of 'window of opportunity' where a one-size-fits-all solution is needed. On that note, I am using "Advanced Renamer V.3.85" which allows me to build-out renames of files using pre-existing tag information from the files (with the caveat they exist in the Windows File System) (i.e. incremental numbering, checksum, video tags, date/time tags, image tags .... all essential items for some large torrent collections where Order is required to tame the Chaos of file and directory names.
For folks like myself, who manage very large 'eco' systems of torrents through BT (I mean, it is the granddaddy Cadillac of Torrent Apps) renaming files is a regular thing. People who use mainly public trackers probably don't care about a torrent once their download of its files is complete. They can bring Order using any tools like would like to apply to the files But for those of us who interact with private trackers, the need to maintain ratio is important, so the torrents must continue to exist and be made available for uploading. It is time-consuming work to change file names. :/ Hence SfEDF has been a big blessing, as has the batch-rename function. The other component being EXIF functionality which Advanced Renamer 3.85 brings to the equation. IF, big IF, I can pre-identify files that might fail in the look-up BEFORE I attempt to apply build-name changing to all the files in the torrent, it would allow me to perform individual renames within BT using Advanced Renamer's "New Name" where I could simply copy/paste into BT to perform the change with the EXIF info wanted.
Just thinking of options.
Thanks! As always. :) I appreciate your time.
Just emailed you with the scenario data, and screenshot :)
B05's out, hopefully fixes things somewhat!
Java 1.8.0_202 (64 bit) c:\program files\biglybt\jre SWT v4930r7, win32, zoom=100, dpi=96 Windows 10 v10.0, amd64 (64 bit) B2.4.0.1_B03/4 az3
Hey Team - I hope all is well with everyone. Looks like you've been keeping busy.
I have a potential problem occurring with Search for Existing Data Files ... (SfEDF)
To Validate: First, let me ask for validation of the premise that renaming a Windows file does not affect the MD5 Hash of the file itself. i.e. I download a torrent with 17 files. One of the files is named 'sdfjhlskfhi.mp4' has the MD5 hash '15F774A9B218D96AE34EE21390A0A09F'.
Scenario 1:
In this scenario: (A): BT missed identifying SOME of the existing files located in our root folder backup copy, that had been renamed, but was successful in identifying and linking to other files contained in the same folder as those which were missed. (B) Targeting those files which BT missed in linking in Step (A) above, renaming them through Windows Explorer back to the names that are expected by the torrent, and copying them back into the torrent's folder, BT successful will identify them.
Based upon this, I would not suspect there is an issue with renaming the files and impacting the MD5 Hash - OR - BT's SfEDF scan is not accurately identifying 100% of files targeted in a scan?
Scenario 2: I can duplicate Scenario 1 by following those steps up to and including making a backup of the files, then within BT, DELETING the existing torrent (and all files), then re-adding the torrent in a stopped state, then use the SfEDF and directing BT to the root folder where the completed backup files are located (but in this scenario, I've not renamed the files with a bulk-renaming utility (the files are left exactly as they were when copied from the original completed torrent download.
Based upon Scenario 2, and comparing it to Scenario 1, it appears that BT's SfEDF is not getting an accurate and complete read of MD5 Hash (or the other HASH variants used, I know MD5 is the short one, not sure which one BT uses) from files on the targeted location (in this case a specifically mapped root fold location, but it misses the same if I use the 'default' search locations.
My environment: I am running PLEX, and it does update every 15 minutes or any time it detects a change to a file within any folders/files within its library). I am running Windows Search and Indexing in the same environment as well. So BT is operating in that same sphere, dynamically, as those two services. I believe PLEX has a few additional services for META matching which run in the background (i.e. Python) so I am uncertain if there can be any time of interference in BT's attempt to perform a SfEDF scans when they are performed.
Am I expecting too much of SfEDF or are there variables I need to tweak or consider?
Thanks!